Comments

3360 Comments
karma

openQA is showing the same "Ooops!" problem when installing updates as the Rawhide update. Notably, F43 does not have the PackageKit dnf5 backend change, so it looks like that can't be the cause.

karma

The failure here seems to be a genuine one, but hard to debug. The update test failed the same way - the update transaction seems to complete, then immediately the right pane goes to blank grey and "Ooops!" in red text appears at top right - on both aarch64 and x86_64, on both prod and staging, twice each (so eight times in total), and has never failed that way on another update before or since AFAICT, so it definitely seems to be caused by this update.

Note openQA is currently using a scratch build I did to deal with https://github.com/cockpit-project/cockpit/pull/22800 while awaiting the release of this new version. The scratch build was the same as the 354-2.fc44 official build but with the initial, simple version of my PR - not Martin's improved version - backported. It did not have any other changes.

I'll try and find time to see if I can reproduce this manually later and find any more information about it.

The shortened timeout was an intentional change, so I've been trying to adjust the tests to account for it, but it's turned out to be really difficult because of two other bugs I found. These aren't new bugs but the timeout changing from 60 seconds to 10 seconds made them much more disruptive for the openQA tests:

It's very difficult / impossible to get the tests to behave correctly without at least one of those being fixed, so I'm going to leave this blocked for now.

There's also a third bonus bug that doesn't really affect openQA but which I noticed while verifying the other two: Login screen never goes to idle mode if there is a character in the password field or the username field is visible

oddly, in around the last 24 hours, I've started intermittently seeing FreeIPA / AD client tests fail because the test switches to tty3 and sees a blank screen with flashing cursor. Not every time, just sometimes, it never happened before. It's possible the kmscon update alone (which wasn't gated, so it's stable now) caused this somehow, I guess? But I haven't been able to dig into it in detail yet, it might be some other cause entirely...

That package is on installer images, btw, so that's a critical issue. That's why the openQA tests failed.

karma

Per the rmdepcheck result, this is missing a rebuild of python-pycdio:

Dependencies of other packages that would be BROKEN by the tested packages:
package: python3-pycdio-2.1.1-8.fc44.x86_64 from https://kojipkgs.fedoraproject.org/repos/f44-build/latest/x86_64
  libiso9660.so.11()(64bit)
  libiso9660.so.11(ISO9660_11)(64bit)

Ugh, there's definitely some kind of weird intermittent issue with tty3 not spawning in AD/FreeIPA enrolment tests, but I don't think it has anything to do with this update. Sigh.

Yeah, I only waived the frr test.

Yeah, that's why it's odd. Must have been a very precise timing issue of some kind.

It looks like update.install_default_update_live and update.realmd_join_sssd_ad ended up failing this time around. I still find the output of these tests inscrutable, though.

The deal is that we (Quality team) look at failures and translate them for you, unless you want to be really keen. ;) In this case install_default_update_live was just a test flake. I think realmd_join_sssd_ad was also a flake - I've restarted it - though it failed the same odd way twice in a row, which is concerning. What happened is the test did ctrl-alt-f3 to get a fresh terminal, and it got a blank screen instead.

So all the test failures here are because, on boot, plasma-login shows the screen with the username and the password prompt for just ten seconds before switching to the screen showing only the clock.

In openQA our 'boot to the login screen' logic includes a thing where we check we see the login screen, then wait ten seconds and check again. This is to avoid an old bug where (due to graphics memory buffering or something) we'd sometimes see a login screen frame from the previous boot during boot, then try to type into that and everything would go wrong.

I can tweak the openQA logic a bit (either get rid of the wait and hope that buffer bug is gone, or make it hit 'esc' before doing the second check or something) but the ten second timeout seems really aggressive? Is it intentional?

karma

Still the crash on startup.

Huh, weird that the gating status got to 'passed' there...oh, well. It's probably fine.

Testing both updates together makes things worse, if anything - the cloud and netinst image build tests now fail because when we do ctrl-alt-f3 near the end of the test, we get a blank screen.

Looks like the tests still fail even with the kmscon update available. Will try with both fedora-release and systemd updates.

Oh, I see there's actually a third update involved...that one is stable now, but was not when this update was created, by the looks of it. I can re-run the tests and see if they do any better now.

See my earlier comment. If the systemd and fedora-release changes are interdependent they should have been submitted as a multi-package update. This is in the updates policy.

As I said I'll try and find a minute to test these two separate updates together on staging and see if they work better that way, but if so, they need to be rebuilt together on a side tag to produce a combined update that will pass testing.

The Cloud image displays a few boot messages before it stops updating the display. Systems installed from a network install image don't show anything after the "Booting <bootloader meun name>" message on x86_64, on aarch64 they show a lot of messages but never reach a VT - the difference there may be that plymouth defaults on for x86_64 but off for aarch64, IIRC.

In each case, switching to tty6 does actually give a working login prompt, sorry I may have implied that it didn't in my earlier message.

I'm not sure if things would be better or worse if this update and FEDORA-2026-b5d576a628 were combined? I can try testing them together on staging, I guess.

karma

In openQA testing, this seems to break non-graphical boot. Graphical boot is fine and such tests can also later do ctrl-alt-f3 or ctrl-alt-f6 and get a working VT, but non-graphical boot appears to stop updating the display very early in the boot process and never display a login prompt (presumably this is kmsconvt misbehaving?)

Test failures are because this depends on FEDORA-2026-c58bd53922 but was not properly bundled with it. @praiskup you should do proper bundled updates even for Rawhide, otherwise you may encounter problems like this.

I'll re-run the tests, they should pass now the other update is stable.