According to slub_debug=F
-enabled logs (see [1] [2] below), there is at least one bug in ath11k:
...
...
Jun 05 18:56:20 fedora.fritz.box kernel: CPU: 1 PID: 13592 Comm: kworker/u32:6 Tainted: G B 6.3.5-200.fc38.x86_64 #1
Jun 05 18:56:20 fedora.fritz.box kernel: Slab 0xffffeffd4d324000 objects=32 used=10 fp=0xffff8fd10c901400 flags=0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
Jun 05 18:56:20 fedora.fritz.box kernel: -----------------------------------------------------------------------------
Jun 05 18:56:20 fedora.fritz.box kernel: BUG kmalloc-1k (Tainted: G B ): Wrong object count. Counter is 10 but counted were 28
Jun 05 18:56:20 fedora.fritz.box kernel: =============================================================================
Jun 05 18:56:20 fedora.fritz.box kernel: Disabling lock debugging due to kernel taint
...
...
...
...
...
Jun 05 18:56:20 fedora.fritz.box kernel: Object 0xffff8fd10c902000 @offset=8192 fp=0xc5d6e3752d901092
Jun 05 18:56:20 fedora.fritz.box kernel: Slab 0xffffeffd4d324000 objects=32 used=10 fp=0xffff8fd10c901400 flags=0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
Jun 05 18:56:20 fedora.fritz.box kernel: -----------------------------------------------------------------------------
Jun 05 18:56:20 fedora.fritz.box kernel: BUG kmalloc-1k (Not tainted): Freechain corrupt
Jun 05 18:56:20 fedora.fritz.box kernel: =============================================================================
Jun 05 18:56:17 fedora.fritz.box kernel: ath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory setting
Jun 05 18:56:17 fedora.fritz.box kernel: ath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory setting
...
...
AFTER the above: cat /proc/sys/kernel/tainted
= 32.
Normally: cat /proc/sys/kernel/tainted
= 0.
[1] full logs of the above: https://gitlab.com/py0xc31/public-tmp-storage/-/blob/main/slub_debug-F/HIT/slub_debug_HIT.log
[2] related bug report: https://bugzilla.redhat.com/show_bug.cgi?id=2193110
Given the different behaviors of the "occurrences" on my system, it remains unclear if this bug is the only one. I will keep logging.
@jforbes sorry, that was not my intention. I wasn't aware that negative karma is only intended for regressions that have not been reported before. In that case, the karma is positive for 6.3.5.
Something comparable to what @adonnen posted was also posted by another user on Ask.Fedora:
https://discussion.fedoraproject.org/t/fedora-hangs-on-boot-after-upgrading-to-kernel-6-3-4/83605
BZ#2193110 does still appear on AMD Ryzen 6850 PRO.
The journalctl is comparable to an earlier one (it is one of the less detailed); the last few minutes are:
May 30 23:57:49 fedora.domain kernel: #PF: error_code(0x0002) - not-present page
May 30 23:57:49 fedora.domain kernel: #PF: supervisor write access in kernel mode
May 30 23:57:49 fedora.domain kernel: BUG: unable to handle page fault for address: 00000000bde90000
May 30 23:57:16 fedora.domain wpa_supplicant[2508]: wlp2s0: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=COUNTRY alpha2=DE
May 30 23:57:08 fedora.domain wpa_supplicant[2508]: wlp2s0: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=COUNTRY alpha2=US
May 30 23:57:05 fedora.domain kernel: ath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory setting
May 30 23:57:05 fedora.domain kernel: ath11k_pci 0000:02:00.0: Failed to set the requested Country regulatory setting
May 30 23:57:01 fedora.domain NetworkManager[2230]: <info> [1685483821.9970] device (wlp2s0): supplicant interface state: interface_disabled -> inactive
May 30 23:57:01 fedora.domain NetworkManager[2230]: <info> [1685483821.9907] device (wlp2s0): supplicant interface state: inactive -> interface_disabled
May 30 23:57:01 fedora.domain NetworkManager[2230]: <info> [1685483821.8960] device (wlp2s0): set-hw-addr: set MAC address to AA:AA:AA:AA:AA:AA (scanning)
May 30 23:56:33 fedora.domain cupsd[2286]: REQUEST localhost - - "POST / HTTP/1.1" 200 185 Renew-Subscription successful-ok
May 30 23:53:52 fedora.domain plasmashell[8658]: [Child 8658, Main Thread] WARNING: JSWindowActorChild::SendRawMessage (Conduits, ConduitClosed) not sent: !CanSend() || !mManager || !mManager->CanSend(): file /builddir/build/BUILD/firefox-113.0.1/dom/ipc/jsactor/JSWindowActorChild.cpp:56
Beyond that, I was working with the 6.3.5 kernel around 2 hours without issues with several applications (Fedora 38 KDE Spin).
Given the clear indication of the log of 28th May (regarding drivers/gpu/drm/ttm/ttm_bo.c
), I opened https://lore.kernel.org/dri-devel/69d51cd5-732f-9dc5-4e12-d68990132c85@my.mail.de/T/#u
./runtests.sh
PASS with the kernel-6.3.5-200.fc38.x86_64 builds within a kvm/qemu VM (KDE spin, up to date with all testing repos of F38 enabled) with CPU-passthrough on a AMD Ryzen 6850 PRO host. No third party modules (tainted = 0).
I tested the VM some minutes with average activities, works fine so far. No errors/issues when using it.
BZ#2193110 was not yet tested (VMs seem not affected by the bug, even with cpu-passthrough).
With regards to my previous comment, I had to return to 6.2.15 because 6.3.4 creates too many kernel errors/freezes so that the system is not usable with this kernel.
However, it causes a new phenomenon: Firefox *¹ freezes and crashes, it cannot be re-started, and pidof firefox
does no longer work then (it just idles without any return until I do CTRL+C). However, the system does not freeze, although the kernel errors are logged in the same way, but it seems to not "spread" from one core/thread to others.
Again, there are massive amounts of kernel errors logged (see BZ#2193110 for the full logs). But some more indicative are maybe:
May 28 14:38:41 fedora.domain kernel: WARNING: CPU: 4 PID: 5523 at drivers/gpu/drm/ttm/ttm_bo.c:326 ttm_bo_release+0x289/0x2e0 [ttm]
...
May 28 14:38:41 fedora.domain kernel: WARNING: CPU: 4 PID: 5523 at drivers/gpu/drm/ttm/ttm_bo.c:327 ttm_bo_release+0x296/0x2e0 [ttm]
...
May 28 14:38:41 fedora.domain kernel: kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:193!
However, once I tried to shutdown, it became obvious that the running system was already in a corrupted state, and the shutdown culminated in ...
May 28 14:51:09 fedora.domain kernel: #PF: error_code(0x0000) - not-present page
May 28 14:51:09 fedora.domain kernel: #PF: supervisor read access in kernel mode
May 28 14:51:09 fedora.domain kernel: BUG: unable to handle page fault for address: 0000003000300010
*¹ I had freezes also without Firefox running, so it is not Firefox-specific; see BZ#2193110
BZ#2193110 persists with 6.3.4 (tested natively on AMD Ryzen 6850 PRO host). I experienced it already twice.
Major log entries from the first freeze:
May 28 13:20:25 fedora.domain kernel: RIP: 0010:__kmem_cache_alloc_node+0x1ba/0x320
May 28 13:20:25 fedora.domain kernel: Hardware name: LENOVO 21CHCTO1WW/21CHCTO1WW, BIOS R23ET60W (1.30 ) 09/14/2022
May 28 13:20:25 fedora.domain kernel: CPU: 8 PID: 5056 Comm: kwin_wayla:cs0 Not tainted 6.3.4-201.fc38.x86_64 #1
May 28 13:20:25 fedora.domain kernel: general protection fault, probably for non-canonical address 0x49f8d7efd771dae6: 0000 [#1] PREEMPT SMP NOPTI
-> the desktop clock has frozen at 13:20:24
Full logs at https://bugzilla.redhat.com/show_bug.cgi?id=2193110
My host XFS file systems work fine (but I didn't experience the XFS issue on 6.3.3 as well).
./runtests.sh PASS with kernel-6.3.4-201.fc38.x86_64, kernel-headers-6.3.3-200.fc38.x86_64, kernel-tools-6.3.3-200.fc38.x86_64 builds within a KVM/QEMU VM (KDE spin, up to date) running on a AMD Ryzen 6850 PRO host. No third party modules (tainted = 0).
I tested the VM some minutes with average activities, works fine so far. No errors/issues when using it.
I have NOT yet tested if it works on my host (with "work" I mean if it can solve BZ#2193110; the freeze can appear on the host when any VM is running but has not yet occurred within VMs, including with cpu host-passthrough).
I will now test on the host and report in BZ#2193110 if 6.3.4 creates any changes in the behavior.
I have xfs file systems but they have not been affected by the 6.3.3 issue.
Works fine with KDE on x86_64, AMD Ryzen 7, in conjunction with the other updates in testing (tesing repos enabled + up to date as of now). No dedicated tests for the CVEs.
Works fine with KDE on x86_64, AMD Ryzen 7, in conjunction with the other updates in testing (tesing repos enabled + up to date as of now). No dedicated tests for the CVEs.
Works on x86_64 with average tasks using the interface. I have no FreeIPA/AD domain for a test case. Tested on Fedora KDE with all testing repos enabled and up to date as of now.
Works on x86_64 with average tasks using the interface. I have no FreeIPA/AD domain for a test case. Tested on Fedora KDE with all testing repos enabled and up to date as of now.
@benthaase @generalprobe -> please check if this bug maybe applies to you and provide some complementary data and reports if so: https://bugzilla.redhat.com/show_bug.cgi?id=2193110 (have you had such issues already before? In earlier kernels? What are the circumstances/behaviors?)
@generalprobe Do you have also AMD Ryzen 7 ?
@runekl -> I have several XFS deployed without issues. The changelog of the kernel does not contain changes to xfs except some relations to xfstests, but not sure that this can have such impacts (?). Do you have maybe introduced some other updates along with the kernel? If not, you might file a bug, just to ensure it can be investigated
./runtests.sh PASS with 6.3.3-100.fc37.x86_64 within a kvm/qemu VM (KDE spin, up to date with all testing repos of F37 enabled) on a AMD Ryzen 6000 mobile series host. No third party modules (tainted = 0).
I tested the VM some minutes with average activities, works fine so far. No errors/issues when using it.
2x vulnerability status:
.../spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl
.../spectre_v1: Mitigation: usercopy/swapgs barriers and __user pointer sanitization
BZ#2187931 and BZ#2187935 not verified
./runtests.sh PASS with 6.3.3-200.fc38.x86_64 within a kvm/qemu VM (KDE spin, up to date with all testing repos of F38 enabled) on a AMD Ryzen 6000 mobile series host. No third party modules (tainted = 0).
I tested the VM some minutes with average activities, works fine so far. No errors/issues when using it.
2x vulnerability status:
.../spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl
.../spectre_v1: Mitigation: usercopy/swapgs barriers and __user pointer sanitization
BZ#2187931 and BZ#2187935 not verified
Ok, now I got it submitted: https://bugzilla.redhat.com/show_bug.cgi?id=2193110
I try already :) But it always crashes before I can submit. I have no other machine available atm. I guess I have to downgrade later and try again. But given the overall issue, I am not sure if that will help, its not sure if Firefox is the origin.
Another supplement: There are two issues. And one is definitely in the new firefox build, occuring only in firefox-112.0.2-1.fc38. The other issue that correlates to the changed pts just makes the firefox-issue occur much more often. But there is a dedicated issue in Firefox. When starting movies in Netflix, it always crashes, even if the other issue is absent and other applications work. Some other pages, too.
Or there other tests with KDE in here?
Addition to my above -1 Karma: This occurs only once Fedora KDE was sleeping some time, which after some time leads for unknown reasons all processes to be closed. After that, the above problems appear persistently until reboot. They also appear with Thunderbird then.
What goes along with this changed "state" is that the pts
change: e.g., if I usually open the first terminal on KDE, the first one is pts/1. After this "close every process" sleep occurrence, the first terminal opens as pts/0. I cannot say if there is a relation, only that there is a correlation (tested 3 times).
I guess this problem only affects Firefox, but Firefox is not the origin. I cannot say much more and have no time to get deeper into the topic atm unfortunately.
./runtests.sh
PASS with the kernel-6.3.6-200.fc38.x86_64 builds within a kvm/qemu VM (KDE spin, up to date with all testing repos of F38 enabled) with CPU-passthrough on a AMD Ryzen 6850 PRO host. No third party modules (tainted = 0).I tested the VM some minutes with average activities, works fine so far. No errors/issues when using it.
Occurrence of Fedora BZ#2193110 / kernel BZ#217528 was not yet tested (my VMs are not affected by the bug). I will do testing on the host in a few days (with and without blacklisting ath11k_pci,ath11k. I will report in kernel BZ#217528 only if 6.3.6 behaves different than previous kernels in this respect. Otherwise, I give the maintainer time to review the data ;)