Update of Python and python3-docs to 3.9.9. Fixed python3-typed_ast for this Python version.

How to install

sudo dnf upgrade --refresh --advisory=FEDORA-2021-e152ce5f31

This update has been submitted for testing by thrnciar.

9 months ago

This update's test gating status has been changed to 'failed'.

9 months ago

This update's test gating status has been changed to 'waiting'.

9 months ago

This update's test gating status has been changed to 'failed'.

9 months ago

churchyard edited this update.

9 months ago
User Icon adamwill commented & provided feedback 9 months ago
karma

The automated test failures here are failing on FreeIPA shutdown. It appears to be a real failure - it's reproducible, and tests of other updates are not hitting it. The shutdown does eventually work, but it's taking way longer than it should.

The holdup seems to be to do with httpd. In a sample test, the test runs systemctl stop ipa.service at 03:10:04, and it doesn't finally finish stopping until 03:11:39 (so it takes 95 seconds to stop). The big jump in the logs is here:

Nov 09 11:10:07 ipa001.test.openqa.fedoraproject.org systemd[1]: pki-tomcatd@pki-tomcat.service: Consumed 41.396s CPU time.
Nov 09 11:11:19 ipa001.test.openqa.fedoraproject.org sssd_be[10225]: GSSAPI client step 1
Nov 09 11:11:19 ipa001.test.openqa.fedoraproject.org sssd_be[10225]: GSSAPI client step 1
Nov 09 11:11:19 ipa001.test.openqa.fedoraproject.org sssd_be[10225]: GSSAPI client step 1
Nov 09 11:11:19 ipa001.test.openqa.fedoraproject.org sssd_be[10225]: GSSAPI client step 2
Nov 09 11:11:36 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: State 'stop-sigterm' timed out. Killing.
Nov 09 11:11:36 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 9800 (httpd) with signal SIGKILL.
Nov 09 11:11:36 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 9802 (httpd) with signal SIGKILL.
Nov 09 11:11:36 ipa001.test.openqa.fedoraproject.org systemd[1]: httpd.service: Killing process 9803 (httpd) with signal SIGKILL.

From that it looks like it tries to stop httpd with SIGTERM around 11:10:06 and waits 90 seconds for that to work, but it doesn't, so it stops it with SIGKILL at 11:11:36.

/var/log/httpd/error_log ends with this line:

[Tue Nov 09 06:10:06.707303 2021] [mpm_event:notice] [pid 9800:tid 9800] AH00492: caught SIGWINCH, shutting down gracefully

which I think matches the SIGTERM at 11:10:06 in the journal - 11:10:06 is UTC (I used journalctl --utc), 06:10:06 is system local time (the system is in US Eastern time). But it looks like it didn't actually shut down gracefully, though I can't find any messages indicating why.

I checked the timestamps on a test of a different update, also for F34; in that test, systemctl stop ipa.service returns after 8 seconds.

This update has been pushed to testing.

9 months ago

Bodhi is disabling automatic push to stable due to negative karma. The maintainer may push manually if they determine that the issue is not severe.

9 months ago

churchyard edited this update.

9 months ago

Thanks for the notice. I've disabled autopush on f33 as well.

churchyard edited this update.

New build(s):

  • python3-typed_ast-1.4.3-4.fc34

Karma has been reset.

9 months ago

This update has been submitted for testing by churchyard.

9 months ago

Adam, could you please provide a reproducer for the FreeIPA shutdown failure? Ideally over bugzilla. Thanks.

Well, the test deploys a server and a client, runs through some tests, and then stops the service on the server. I don't know whether it's necessary to have a client and do some stuff with it, or whether the bug would reproduce just by deploying FreeIPA, letting it start up, and then trying to stop it...

Where can I see how the test does that? I have no idea how to deploy a server and a client. I am looking at https://fedoraproject.org/wiki/FreeIPA_step_by_step but it feels quite old and it gives very ambiguous instructions like "make sure that the host names are set properly" -- it seems I need to know what I am trying to accomplish when I want to setup FreeIPA and unfortunately I don't.

This update has been pushed to testing.

9 months ago

I'm looking into it. No results so far. I'm trying to find someone from the IPA team to help me with debugging.

@churchyard sorry, had some other fire to deal with today. The test case is probably more up to date and to the point: https://fedoraproject.org/wiki/QA:Testcase_freeipa_trust_server_installation . The test pretty much automates that. You can see the code for any openQA "test module" (the individual bits of a test, like "_console_wait_login" and "role_deploy_domain_controller" - by clicking on them at the test overview screen, e.g. https://openqa.fedoraproject.org/tests/1059611/modules/role_deploy_domain_controller/steps/1/src shows you the code of role_deploy_domain_controller. Sometimes these call library functions that the web UI can't show you, though. The test repo is https://pagure.io/fedora-qa/os-autoinst-distri-fedora/tree/master , there you can find all the test code under tests/ and the libraries of shared functions under lib/. In this case, though, pretty much everything is in the test code itself, and you should see it matches the test case quite closely (it's perl code, but most of what it's doing is just running console commands, so it should be relatively easy to follow). "role_deploy_domain_controller_check" - the module that fails - first checks ipa.service is running after deployment, then waits for the clients to complete, then the next thing it does is the thing that fails (systemctl stop ipa.service).

I've filled an upstream issue (https://pagure.io/freeipa/issue/9034) and I'm gonna try to reproduce the problem in the meantime.

I've tried to reproduce the issue in a virtual machine and IPA installed as described in the Wiki page but I was not successful. The start of the service took ~40 s and the stop took ~6 no matter which version of Python I had installed. I'm gonna try it once more carefully following the test case.

Ok, so, now I have exactly the same environment installed following the rules in the role_deploy_domain_controller. I've compared the output of the commands in my VM with the screenshots and they are almost the same. And I'm still not able to reproduce the issue, stopping the service takes around 6 seconds with both old and new Python.

I'm out of ideas. Is it possible to keep the testing virtual machine running after the failed test so we can connect there and debug the problem in the testing environment? Or start it manually somehow? Is there any way how to get the list of installed packages to compare them with my environment? I might have something different.

I suppose the other possibility is that we do need to run the client tests as well. The client tests do include one which exercises the FreeIPA web UI, and the shutdown issue is in httpd, so it's plausible we don't hit the bug unless the web UI gets exercised.

The openQA client test that exercises the web UI is the slightly misleadingly-named realmd_join_cockpit (it does enrol via Cockpit, but then it also goes on to do the web UI tests as well). Both the freeipa_webui and freeipa_password_change modules exercise the web UI to some extent. If you can add those steps to your reproducer attempt it may help. You may not need to have an actual second client VM, you may be able to just do it all on the server, I haven't tried that though.

Oh, sorry, to answer your questions:

"Is it possible to keep the testing virtual machine running after the failed test so we can connect there and debug the problem in the testing environment? Or start it manually somehow?"

The answer is sort of "theoretically yes, practically it's difficult". We can leave the test 'open' (at least for a couple of hours - after that it would time out, I think there's a setting to override that though), but the tricky part is connecting to it. The official openQA instance runs inside Fedora's infrastructure, which is quite heavily firewalled, it would be rather difficult to get things configured such that you could route a VNC connection all the way in there, I think. RH IT is in charge of some of the firewalling, too, so getting any changes made involves filing a ticket and waiting for weeks.

The other option is to reproduce the bug on an instance of openQA sited somewhere more convenient for connecting to, but setting up a pet openQA deployment is kind of a drag, and it's especially a drag for these tests where we have a server/client setup going on (because it involves special networking configuration). I used to keep a pet deployment in my home office but I got rid of it last year. @lruzicka has one, I think, but I don't know if his is set up to do the client/server tests.

What I usually wind up doing instead is just rewriting the openQA tests on the fly to add whatever debugging steps we need. If you can suggest some stuff you'd like done from the broken state that might help identify what the problem is, I can usually manage to tweak the test environment to do it and get the info out, so just let me know.

"Is there any way how to get the list of installed packages to compare them with my environment?"

Yeah, that we can do quite easily, I'll grab it for you later today.

Thanks for your help. Now, I'm able to reproduce the issue. The only thing I need to do is to log out and back in in the FreeIPA web UI and then it takes more than 90 seconds to stop the webserver even when I do it directly via systemctl stop httpd or apachectl stop commands.

Unfortunately, the log of httpd does not provide anything useful because it does not generate any messages after the sigkill signal.

[Mon Nov 15 09:51:38.171812 2021] [core:info] [pid 8369:tid 8369] AH00096: removed PID file /etc/httpd/run/httpd.pid (pid=8369)
[Mon Nov 15 09:51:38.171876 2021] [mpm_event:notice] [pid 8369:tid 8369] AH00492: caught SIGWINCH, shutting down gracefully
[Mon Nov 15 09:51:38.171916 2021] [mpm_event:debug] [pid 8378:tid 8378] event.c(569): wake up listener
[Mon Nov 15 09:51:38.172071 2021] [mpm_event:debug] [pid 8380:tid 8380] event.c(569): wake up listener
[Mon Nov 15 09:51:38.173648 2021] [mpm_event:debug] [pid 8379:tid 8379] event.c(569): wake up listener
[Mon Nov 15 09:51:38.174300 2021] [mpm_event:debug] [pid 8609:tid 8609] event.c(569): wake up listener
[Mon Nov 15 09:51:38.182319 2021] [ssl:debug] [pid 8380:tid 8504] ssl_engine_io.c(1148): [client 10.0.0.129:43366] AH02001: Connection closed to child 128 with standard shutdown (server test.openqa.fedoraproject.org:443)
[Mon Nov 15 09:51:38.184108 2021] [ssl:debug] [pid 8379:tid 8478] ssl_engine_io.c(1148): [client 10.0.0.129:43360] AH02001: Connection closed to child 64 with standard shutdown (server test.openqa.fedoraproject.org:443)
[Mon Nov 15 09:51:38.187906 2021] [http2:debug] [pid 8609:tid 8609] h2_workers.c(326): h2_workers: cleanup 2 workers idling
[Mon Nov 15 09:51:38.188243 2021] [http2:debug] [pid 8378:tid 8378] h2_workers.c(326): h2_workers: cleanup 2 workers idling
[Mon Nov 15 09:51:38.188538 2021] [ssl:debug] [pid 8380:tid 8504] ssl_engine_io.c(1148): [client 10.0.0.129:43362] AH02001: Connection closed to child 128 with standard shutdown (server test.openqa.fedoraproject.org:443)
[Mon Nov 15 09:51:38.189110 2021] [http2:debug] [pid 8378:tid 8378] h2_workers.c(356): h2_workers: cleanup all workers terminated
[Mon Nov 15 09:51:38.189467 2021] [http2:debug] [pid 8378:tid 8378] h2_workers.c(359): h2_workers: cleanup zombie workers joined
[Mon Nov 15 09:51:38.189618 2021] [wsgi:info] [pid 8378:tid 8378] mod_wsgi (pid=8378): Destroying interpreters.
[Mon Nov 15 09:51:38.189643 2021] [wsgi:info] [pid 8378:tid 8378] mod_wsgi (pid=8378): Cleanup interpreter ''.
[Mon Nov 15 09:51:38.190201 2021] [http2:debug] [pid 8609:tid 8609] h2_workers.c(356): h2_workers: cleanup all workers terminated
[Mon Nov 15 09:51:38.190468 2021] [http2:debug] [pid 8609:tid 8609] h2_workers.c(359): h2_workers: cleanup zombie workers joined
[Mon Nov 15 09:51:38.190633 2021] [wsgi:info] [pid 8609:tid 8609] mod_wsgi (pid=8609): Destroying interpreters.
[Mon Nov 15 09:51:38.190648 2021] [wsgi:info] [pid 8609:tid 8609] mod_wsgi (pid=8609): Cleanup interpreter ''.
[Mon Nov 15 09:51:38.192098 2021] [wsgi:info] [pid 8609:tid 8609] mod_wsgi (pid=8609): Terminating Python.
[Mon Nov 15 09:51:38.195954 2021] [wsgi:info] [pid 8609:tid 8609] mod_wsgi (pid=8609): Python has shutdown.
[Mon Nov 15 09:51:38.198071 2021] [http2:debug] [pid 8380:tid 8380] h2_workers.c(326): h2_workers: cleanup 2 workers idling
[Mon Nov 15 09:51:38.198140 2021] [http2:debug] [pid 8380:tid 8380] h2_workers.c(356): h2_workers: cleanup all workers terminated
[Mon Nov 15 09:51:38.198308 2021] [http2:debug] [pid 8380:tid 8380] h2_workers.c(359): h2_workers: cleanup zombie workers joined
[Mon Nov 15 09:51:38.198363 2021] [wsgi:info] [pid 8380:tid 8380] mod_wsgi (pid=8380): Destroying interpreters.
[Mon Nov 15 09:51:38.198369 2021] [wsgi:info] [pid 8380:tid 8380] mod_wsgi (pid=8380): Cleanup interpreter ''.
[Mon Nov 15 09:51:38.199229 2021] [wsgi:info] [pid 8380:tid 8380] mod_wsgi (pid=8380): Terminating Python.
[Mon Nov 15 09:51:38.203173 2021] [watchdog:debug] [pid 8379:tid 8382] mod_watchdog.c(257): AH02973: Singleton Watchdog (_proxy_hcheck_) stopping
[Mon Nov 15 09:51:38.203612 2021] [http2:debug] [pid 8379:tid 8379] h2_workers.c(326): h2_workers: cleanup 2 workers idling
[Mon Nov 15 09:51:38.203692 2021] [http2:debug] [pid 8379:tid 8379] h2_workers.c(356): h2_workers: cleanup all workers terminated
[Mon Nov 15 09:51:38.203886 2021] [http2:debug] [pid 8379:tid 8379] h2_workers.c(359): h2_workers: cleanup zombie workers joined
[Mon Nov 15 09:51:38.203972 2021] [wsgi:info] [pid 8379:tid 8379] mod_wsgi (pid=8379): Destroying interpreters.
[Mon Nov 15 09:51:38.203979 2021] [wsgi:info] [pid 8379:tid 8379] mod_wsgi (pid=8379): Destroy interpreter 'test.openqa.fedoraproject.org|/ipa/wsgi/plugins.py'.
[Mon Nov 15 09:51:38.204094 2021] [wsgi:info] [pid 8379:tid 8379] mod_wsgi (pid=8379): End interpreter 'test.openqa.fedoraproject.org|/ipa/wsgi/plugins.py'.
[Mon Nov 15 09:51:38.204991 2021] [wsgi:info] [pid 8378:tid 8378] mod_wsgi (pid=8378): Terminating Python.
[Mon Nov 15 09:51:38.209249 2021] [wsgi:info] [pid 8378:tid 8378] mod_wsgi (pid=8378): Python has shutdown.
[Mon Nov 15 09:51:38.211236 2021] [watchdog:debug] [pid 8380:tid 8399] mod_watchdog.c(257): AH02973: Singleton Watchdog (_proxy_hcheck_) stopping
[Mon Nov 15 09:51:38.213406 2021] [wsgi:info] [pid 8380:tid 8380] mod_wsgi (pid=8380): Python has shutdown.

I'm gonna try strace.

I'm still not sure where the problem is and I'm reaching my debugging limits but this strace output seems to be important:

Old Python:

[pid  9336] 10:06:23 openat(AT_FDCWD, "/proc/thread-self/attr/fscreate", O_RDWR|O_CLOEXEC) = 4
[pid  9336] 10:06:23 write(4, NULL, 0)  = 0
[pid  9336] 10:06:23 close(4)           = 0
[pid  9336] 10:06:23 newfstatat(AT_FDCWD, "/run/systemd/ask-password", {st_mode=S_IFDIR|0775, st_size=40, ...}, 0) = 0
[pid  9336] 10:06:23 rt_sigprocmask(SIG_SETMASK, [TERM], NULL, 8) = 0
[pid  9336] 10:06:23 signalfd4(-1, [TERM], 8, SFD_CLOEXEC|SFD_NONBLOCK) = 4
[pid  9336] 10:06:23 inotify_init1(IN_CLOEXEC) = 5
[pid  9336] 10:06:23 inotify_add_watch(5, "/run/systemd/ask-password", IN_CLOSE_WRITE|IN_MOVED_TO) = 1
[pid  9336] 10:06:23 openat(AT_FDCWD, "/run/systemd/ask-password", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 6
[pid  9336] 10:06:23 newfstatat(6, "", {st_mode=S_IFDIR|0775, st_size=40, ...}, AT_EMPTY_PATH) = 0
[pid  9336] 10:06:23 getdents64(6, 0x55813e5b29c0 /* 2 entries */, 32768) = 48
[pid  9336] 10:06:23 getdents64(6, 0x55813e5b29c0 /* 0 entries */, 32768) = 0
[pid  9336] 10:06:23 close(6)           = 0
[pid  9336] 10:06:23 ppoll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, NULL, NULL, 8 <unfinished ...>
[pid  9335] 10:07:52 <... ppoll resumed>) = 1 ([{fd=3, revents=POLLIN}])
[pid  9335] 10:07:52 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\213\3\0\0\f\0\0\0\266\0\0\0\1\1o\0.\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid  9335] 10:07:52 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1/unit/h"..., iov_len=1083}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 1083
[pid  9335] 10:07:52 recvmsg(3, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
[pid  9335] 10:07:52 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=3, revents=POLLIN}])

New Python:

[pid 10340] 10:24:43 openat(AT_FDCWD, "/proc/thread-self/attr/fscreate", O_RDWR|O_CLOEXEC) = 4
[pid 10340] 10:24:43 write(4, NULL, 0)  = 0
[pid 10340] 10:24:43 close(4)           = 0
[pid 10340] 10:24:43 newfstatat(AT_FDCWD, "/run/systemd/ask-password", {st_mode=S_IFDIR|0775, st_size=40, ...}, 0) = 0
[pid 10340] 10:24:43 rt_sigprocmask(SIG_SETMASK, [TERM], NULL, 8) = 0
[pid 10340] 10:24:43 signalfd4(-1, [TERM], 8, SFD_CLOEXEC|SFD_NONBLOCK) = 4
[pid 10340] 10:24:43 inotify_init1(IN_CLOEXEC) = 5
[pid 10340] 10:24:43 inotify_add_watch(5, "/run/systemd/ask-password", IN_CLOSE_WRITE|IN_MOVED_TO) = 1
[pid 10340] 10:24:43 openat(AT_FDCWD, "/run/systemd/ask-password", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 6
[pid 10340] 10:24:43 newfstatat(6, "", {st_mode=S_IFDIR|0775, st_size=40, ...}, AT_EMPTY_PATH) = 0
[pid 10340] 10:24:43 getdents64(6, 0x56247fdb59c0 /* 2 entries */, 32768) = 48
[pid 10340] 10:24:43 getdents64(6, 0x56247fdb59c0 /* 0 entries */, 32768) = 0
[pid 10340] 10:24:43 close(6)           = 0
[pid 10340] 10:24:43 ppoll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, NULL, NULL, 8 <unfinished ...>
[pid 10339] 10:24:44 <... ppoll resumed>) = 1 ([{fd=3, revents=POLLIN}])
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\213\3\0\0\f\0\0\0\266\0\0\0\1\1o\0.\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1/unit/h"..., iov_len=1083}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 1083
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\254\3\0\0\r\0\0\0\266\0\0\0\1\1o\0.\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1/unit/h"..., iov_len=1116}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 1116
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1I\0\0\0\16\0\0\0\242\0\0\0\1\1o\0\31\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1\0\0\0\0\0\0\0"..., iov_len=233}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 233
[pid 10339] 10:24:44 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\4\0012\0\0\0\4\0\0\0\240\0\0\0\1\1o\0.\0\0\0/org/fre"..., iov_len=176}, {iov_base="\35\0\0\0org.freedesktop.systemd1.Uni"..., iov_len=50}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 226
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\213\3\0\0\17\0\0\0\266\0\0\0\1\1o\0.\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1/unit/h"..., iov_len=1083}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 1083
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\214\3\0\0\20\0\0\0\266\0\0\0\1\1o\0.\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1/unit/h"..., iov_len=1084}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 1084
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1G\0\0\0\21\0\0\0\240\0\0\0\1\1o\0\31\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1\0\0\0\0\0\0\0"..., iov_len=223}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 223
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1R\0\0\0\22\0\0\0\240\0\0\0\1\1o\0\31\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
[pid 10339] 10:24:44 recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/systemd1\0\0\0\0\0\0\0"..., iov_len=234}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 234
[pid 10339] 10:24:44 recvmsg(3, {msg_namelen=0}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)

The strace logs and switched, there is no deadlock in the old Python, of course, sorry for that.

Its unclear from the web server log the time the shutdown was requested. Do you see in those logs a request to stop and the next log line being ~90 seconds later? We sort of see that in the apachectl log but its waiting on a POLL so isn't particularly enlightening. It appears to be just waiting for Apache to stop.

I did a time-consuming bisection of commits in Python between 3.9.7 and 3.9.8 and now I can point to the specific commit causing this issue, the winner is: https://github.com/python/cpython/commit/1ecb641b887af2feb026a266e2fceedee0815ca8

I don't understand mod_wsgi well enough to see the connection between the commit and the issue it causes but I'm a bit closer to the finish.

@vstinner is the author of the code so we might be able to debug it together.

Seems like Python 3.9.8 introduced a regression in argparse which causes complex command-line tools to fail to recognize sub-commands properly. You can find more info here: https://bugs.python.org/issue45235

Upstream already published hotfix release 3.9.9. I think we shouldn't ship 3.9.8 at all and do a 3.9.9 update where we can revert the commit causing freeipa to fail. WDYT?

I've tried to bisect the commit again because I wasn't able to reproduce the issue with reverted commit 1ecb641b887af2feb026a266e2fceedee0815ca8 and now it seems that the issue is caused by an earlier commit: https://github.com/python/cpython/commit/94d19f606fa18a1c4d2faca1caf2f470a8ce6d46

Also, it seems that the only thing one needs to do to reproduce the issue is: start the webserver → login to the FreeIPA web UI → log out from the web UI → stop the webserver.

It makes sense to try Python 3.9.9 even I think this problem has nothing to do with argparse. I'm gonna take a look on the changed files in the new release. I might find something else interesting as well.

By the way, I've also tried to set up a simple Python wsgi application and I'm not able to reproduce the same issue with it. From the FreeIPA configuration, it also seems that it does not use threads in mod_wsgi but uses some threading in its own code so it might be related to FreeIPA and not to mod_wsgi in general.

After one more day of debugging, I've created an upstream issue because it does not seem to me that the problem is in FreeIPA: https://github.com/GrahamDumpleton/mod_wsgi/issues/730

Thanks a lot for all the work on this (and the continuing work in the github issue)! Sorry I went AWOL after that last comment, I didn't manage to get the package list on that day and then I was on vacation. Seems like you got along fine without me, though. :D

churchyard edited this update.

New build(s):

  • python3.9-3.9.9-2.fc34

Removed build(s):

  • python3.9-3.9.8-1.fc34

Karma has been reset.

9 months ago

This update has been submitted for testing by churchyard.

9 months ago

churchyard edited this update.

New build(s):

  • python3-docs-3.9.9-1.fc34

Removed build(s):

  • python3-docs-3.9.8-1.fc34

Karma has been reset.

9 months ago

This update's test gating status has been changed to 'passed'.

9 months ago

This update has been pushed to testing.

9 months ago

churchyard edited this update.

9 months ago

We have reverted the problematic commit. The test passed.

User Icon lbalhar commented & provided feedback 9 months ago
karma

I've verified it also manually.

User Icon t3rm1n4l commented & provided feedback 8 months ago
karma

wfm

This update can be pushed to stable now if the maintainer wishes

8 months ago
User Icon wolnei commented & provided feedback 8 months ago
karma

Works for me.

This update has been submitted for stable by bodhi.

8 months ago

This update has been pushed to stable.

8 months ago

Please login to add feedback.

Metadata
Type
enhancement
Karma
3
Signed
Content Type
RPM
Test Gating
Settings
Unstable by Karma
-3
Stable by Karma
5
Stable by Time
14 days
Dates
submitted
9 months ago
in testing
9 months ago
in stable
8 months ago
modified
9 months ago

Automated Test Results