ok, that is a superset of the fixes that I was told were relevant to recent strix point and strix halo graphical issues. I've filed a bug requesting a new build: https://bugzilla.redhat.com/show_bug.cgi?id=2420062
@ben3892 - also, there is a recent fix for rocm specific to strix halo that will be patched soon if you end up seeing more instability issues. There is a corresponding fix in the kernel but all the current Fedora kernels have that fix already. https://github.com/ROCm/rocm-systems/pull/1986
Also broken on Fedora 42 host, running Fedora-42/43/44 Docker containers with ROCM/pytorch installed :
broken for : kernels 6.17.(9,10,11), still works for : kernel 6.17.8
SEGV occurs in ROCm/runtime fwiw, when running e.g. clinfo and various pytorch oneliners, e.g. the one mentioned here : https://gitlab.freedesktop.org/drm/amd/-/issues/4751
apu : AMD Ryzen AI 9 HX 370 / StrixPoint / ROCm target gfx1150
laptop : Thinkpad P14s-AMD-Gen6, Fedora42 updated to latest, with dyndbg kernel cmdline enabled to log loaded firmware blobs
Can not downgrade to previous linux-firmware:20251111 as that package has disappeared from upstream. I could reinstall the locally saved package, However, as the machine is my daily driver, I rather shrink from fidlding with things when I do not really know what I'd be doing.
Would you per-chance have any estimate when a new package will be available for retrials ? I'd be very grateful for any hints, thanks for your efforts.
Same thing here, downgraded to Linux-firmware 20251111. In the newest version, unrecoverable freezes occur when using openCL, including the usage of rusticl and ROCm. This mostly happens under larger loads, for example when using darktable. I'm hoping a next version of Linux firmware will fix this.
This update has been submitted for testing by pbrobinson.
This update's test gating status has been changed to 'waiting'.
pbrobinson edited this update.
This update's test gating status has been changed to 'passed'.
This update has been pushed to testing.
Works.
This update can be pushed to stable now if the maintainer wishes
Works for me..
Works well on a Framework 13 with AMD Ryzen AI 300 Series CPU.
Works
Works great! LGTM! =)
This update has been submitted for stable by pbrobinson.
No breakages detected on ThinkPad E14 Gen 4 (Intel)
no issues with Thinkpad P1 Gen4 (Intel)
No issues spotted with 9950X3D + 7900XTX and Lenovo P1 Gen7
This update has been pushed to stable.
This update broke rocm support for me, I had to revert to 20251111.
Using an AMD AI MAX+ 395
@ben3892 can you elaborate a bit more on what broke? has a bug been filed somewhere?
I have not filled a bug, sorry I am not familiar with the process yet.
But the root cause was identified and fixed upstream: https://gitlab.com/kernel-firmware/linux-firmware/-/merge_requests/810
ok, that is a superset of the fixes that I was told were relevant to recent strix point and strix halo graphical issues. I've filed a bug requesting a new build: https://bugzilla.redhat.com/show_bug.cgi?id=2420062
@ben3892 - also, there is a recent fix for rocm specific to strix halo that will be patched soon if you end up seeing more instability issues. There is a corresponding fix in the kernel but all the current Fedora kernels have that fix already. https://github.com/ROCm/rocm-systems/pull/1986
Also broken on Fedora 42 host, running Fedora-42/43/44 Docker containers with ROCM/pytorch installed :
As a complete GPU noob, I don't know what to make of technical content of related posts. There are many. Also latest kernel.org change has 5 amdgpu/dcn_xxx blobs listed https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=6c60d1128566f8cc9c3ddd7ad1db7adec824f71b, as opposed to the 3 in the redhat bug ticket https://bugzilla.redhat.com/show_bug.cgi?id=2420062#c6. Note that only 1 of the changed firmware blobs namely amdgpu/dcn_3_5_dmcub.bin is actually loaded by kmod admgpu, acc. to dyndbg logs on machine.
Can not downgrade to previous linux-firmware:20251111 as that package has disappeared from upstream. I could reinstall the locally saved package, However, as the machine is my daily driver, I rather shrink from fidlding with things when I do not really know what I'd be doing.
Would you per-chance have any estimate when a new package will be available for retrials ? I'd be very grateful for any hints, thanks for your efforts.
Same thing here, downgraded to Linux-firmware 20251111. In the newest version, unrecoverable freezes occur when using openCL, including the usage of rusticl and ROCm. This mostly happens under larger loads, for example when using darktable. I'm hoping a next version of Linux firmware will fix this.
Update: A patch has been made, just waiting for a release now. https://gitlab.com/kernel-firmware/linux-firmware/-/commit/3d5c8135206cef364e7d353711b3e7358a90d152 I'll try to build this myself and test it.