Comments

997 Comments

This should not be pushed stable unless timedatex is dropped from the server-product group in comps, because if systemd obsoletes timedatex but it's a "mandatory" package in comps, anaconda gives up and Server cannot be installed - that is a current Rawhide test, where this change has already landed, and it's broken Server.

I'll send PRs for F31 and F32 comps, but until the F31 one gets reviewed and merged, this should not be pushed.

the openQA tests here seem to have triggered a glib bug. When testing an update, openQA boots to a clean desktop using a pre-rolled base disk image, then switches to a console, sets up a repo containing the packages from the update, and runs dnf update - which will both bring the system fully up-to-date with current stable, and install any packages from the update that are part of the installed package set. During that run, the GNOME session that was running in the background crashed, with the error message from that glib issue: _ip_get_path_for_wd: assertion failed: (wd >= 0).

I'm not sure if there's anything about the pygobject update that would trigger that crash specifically, or if it was sheer bad luck that this failed twice in a row on the same update test for this update.

OK, so it got weirdly harder to reproduce the crash - I had a bunch of passed tests with both the original update and the scratch build - but I finally got a crash with a full backtrace:

That's with the later scratch build, 37740210.

Yeah, same result in openQA.

Current status: @mreynolds did a scratch build with a potential fix, but it still crashed. I am now knee-deep in hacking up the test to try and attach gdb to ns-slapd and get a backtrace out of it.

karma

Issue definitely seems to be fixed on my local system at least.

BZ#1751372 [Wayland] [regression] After updating to version 69, switching between tabs doesn't always update the window's contents

Oh, yikes, I just noticed it's actually crashing, isn't it?

Sep 17 18:14:21 ipa001.domain.local audit[7901]: ANOM_ABEND auid=4294967295 uid=389 gid=389 ses=4294967295 subj=system_u:system_r:dirsrv_t:s0 pid=7901 comm="ns-slapd" exe="/usr/sbin/ns-slapd" sig=11 res=1
Sep 17 18:14:21 ipa001.domain.local kernel: show_signal_msg: 56 callbacks suppressed
Sep 17 18:14:21 ipa001.domain.local kernel: ns-slapd[7932]: segfault at 303735d1 ip 00007f76bd8e5db4 sp 00007f76a5df5cd8 error 4 in libslapd.so.0.1.0[7f76bd8d1000+b2000]
Sep 17 18:14:21 ipa001.domain.local kernel: Code: 8d 35 6d e0 09 00 e8 db 50 ff ff b8 ff ff ff ff e9 cb fd ff ff e8 8c 56 ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa <8b> 87 a0 00 00 00 21 f0 c3 0f 1f 00 f3 0f 1e fa 09 b7 a0 00 00 00
Sep 17 18:14:21 ipa001.domain.local named-pkcs11[8339]: LDAP error: Can't contact LDAP server: ldap_sync_poll() failed
Sep 17 18:14:21 ipa001.domain.local named-pkcs11[8339]: ldap_syncrepl will reconnect in 60 seconds
Sep 17 18:14:21 ipa001.domain.local systemd[1]: dirsrv@DOMAIN-LOCAL.service: Main process exited, code=killed, status=11/SEGV
Sep 17 18:14:21 ipa001.domain.local systemd[1]: dirsrv@DOMAIN-LOCAL.service: Failed with result 'signal'.

It doesn't seem like we caught a core dump, though, unfortunately - I do have the openQA tests set up to try and capture any core dump caught by coredumpctl or abrt, but it doesn't seem to have found one. I'll have to poke it some more tomorrow and see if I can get a hold of the dump.

Note the failed web UI access attempt happens just five seconds after the service decides to shut down. So there's a timing element here; presumably that's why the earlier web UI access as admin works.

So, I fiddled with openQA a bit and got logs from the server. Note that I did get one successful run of the test, so it seems the bug doesn't happen every time. I now have 4 fails to 1 pass, though.

Here's the /var/log tarball from the server. It seems that the web UI error is due to the directory server not being available, which is kinda what I expected. The journal shows these errors for dirsrv@DOMAIN-LOCAL.service:

Sep 17 18:06:18 ipa001.domain.local ns-slapd[7901]: [17/Sep/2019:21:06:18.258867527 -0400] - ERR - dna-plugin - dna_update_shared_config - Unable to update shared config entry: dnaHostname=ipa001.domain.local+dnaPortNum=389,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=domain,dc=local [error 21]
Sep 17 18:06:22 ipa001.domain.local ns-slapd[7901]: [17/Sep/2019:21:06:22.943741546 -0400] - ERR - dna-plugin - dna_update_shared_config - Unable to update shared config entry: dnaHostname=ipa001.domain.local+dnaPortNum=389,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=domain,dc=local [error 21]
Sep 17 18:11:04 ipa001.domain.local ns-slapd[7901]: [17/Sep/2019:21:11:04.640917031 -0400] - ERR - dna-plugin - dna_update_shared_config - Unable to update shared config entry: dnaHostname=ipa001.domain.local+dnaPortNum=389,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=domain,dc=local [error 21]
Sep 17 18:11:35 ipa001.domain.local ns-slapd[7901]: [17/Sep/2019:21:11:35.299014906 -0400] - ERR - dna-plugin - dna_update_shared_config - Unable to update shared config entry: dnaHostname=ipa001.domain.local+dnaPortNum=389,cn=posix-ids,cn=dna,cn=ipa,cn=etc,dc=domain,dc=local [error 21]
Sep 17 18:14:21 ipa001.domain.local systemd[1]: dirsrv@DOMAIN-LOCAL.service: Main process exited, code=killed, status=11/SEGV
Sep 17 18:14:21 ipa001.domain.local systemd[1]: dirsrv@DOMAIN-LOCAL.service: Failed with result 'signal'.

I'll ping more FreeIPA folks on this, but also - don't we usually have a policy that no changes that make the SELinux policy more restrictive are introduced after Beta freeze?

Rawhide KDE live yesterday (which has this build already) got the new background, so that looks good.

BZ#1749086 Needs updating for Fedora 31 background etc.

Note: same failure on prod and staging, so it doesn't look like a flake.

A FreeIPA test failed. It looks like the web UI errors out immediately after logging in as a regular domain user (though it works OK in a previous test step when logged in as the administrator). I'll try and look into this in more detail in a bit.

This update seems to be breaking FreeIPA replica deployment:

https://openqa.fedoraproject.org/tests/452181

I am not sure why yet, but it definitely seems to be failing on this update repeatedly, but passing for other updates. I'll look into it more tomorrow (need logs from the master end which aren't currently saved, I think).

@ab

openQA tests look good, seems to be working OK in basic testing on my desktop. Still has the tty? bug though.

For the record, testing this myself it seems we're still missing something, as I don't get the correct F31 background for a desktop session. I get an upstream KDE background called 'Next' instead. However, this is still technically an improvement on the previous state, as it means we're at least not using the same background as F30 did, which means we're not violating the Basic release criteria any more.

This seems to have broken something about icons. Several openQA tests are failing, due to icon issues in the installer (a little 'x' icon that should appear in the text input area in the language selection screen is missing, replaced with some kind of error icon, an exclamation point in a triangle on a sheet of paper - https://openqa.fedoraproject.org/tests/448238#step/_boot_to_anaconda/6 ), and the "Show Applications" button in the overview (a grid of nine white dots) is missing entirely.

This seems to have broken something about icons. Several openQA tests are failing, due to icon issues in the installer (a little 'x' icon that should appear in the text input area in the language selection screen is missing, replaced with some kind of error icon, an exclamation point in a triangle on a sheet of paper - https://openqa.fedoraproject.org/tests/448238#step/_boot_to_anaconda/6 ), and the "Show Applications" button in the overview (a grid of nine white dots) is missing entirely.