+ amdgpu folks On Tue, Apr 29, 2025 at 02:51:56PM +0200, Marcus Rückert wrote: > Hardware: > - ASUS ROG Swift OLED PG27AQDP @ 480 Hz > - LG 27GL850-B @ 144Hz > - XFX Mercury Radeon RX 9070 XT OC Gaming Edition with RGB, 16GB GDDR6, HDMI, > 3x DP RX-97TRGBBB9 > - Ryzen 9 9950X3D on ASUS ProArt X870E-Creator WiFi > - be quiet! Dark Power 13 850W ATX 3.0 > > Software: > - kernel-default-6.15~rc4-1.1.g62ec7c7.x86_64 from > https://build.opensuse.org/project/show/Kernel:HEAD > - Mesa-25.1+git442.5841d44f9-1747.1.x86_64 from > https://build.opensuse.org/package/show/home:darix:playground/Mesa > - GE-Proton 9-27 > https://github.com/GloriousEggroll/proton-ge-custom/releases/tag/GE-Proton9-27 > - Overwatch via steam > > ``` > [Mon Apr 28 23:10:56 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: Dumping IP > State > [Mon Apr 28 23:10:56 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: Dumping IP > State Completed > [Mon Apr 28 23:10:56 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: [drm] > AMDGPU device coredump file has been created > [Mon Apr 28 23:10:56 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: [drm] > Check your /sys/class/drm/card1/device/devcoredump/data > [Mon Apr 28 23:10:56 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: ring > gfx_0.0.0 timeout, but soft recovered > [Mon Apr 28 23:11:07 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: Dumping IP > State > [Mon Apr 28 23:11:07 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: Dumping IP > State Completed > [Mon Apr 28 23:11:07 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: [drm] > AMDGPU device coredump file has been created > [Mon Apr 28 23:11:07 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: [drm] > Check your /sys/class/drm/card1/device/devcoredump/data > [Mon Apr 28 23:11:07 2025] [ T10460] amdgpu 0000:03:00.0: amdgpu: ring > gfx_0.0.0 timeout, but soft recovered > ``` > > Usually I have that like once a day or so. But yesterday it was especially > bad: > > ``` > Apr 28 21:46:57 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 21:47:08 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 21:47:18 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 21:47:28 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 21:54:34 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 22:00:40 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 22:00:50 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 22:01:00 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 23:10:56 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > Apr 28 23:11:07 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, > but soft recovered > ``` > > Together with my coworker Patrik Jakobsson and Takashi Iwai we already chased > down a few other issues (like the dreaded flip_done). > But this last issue remains. We first backported some fixes to our 6.14.x > kernel for flip_done. To get even more fixes I switched to the 6.15~rc > kernels. > > Then also went with Mesa 25.1~rc which didnt fix it. so now it is a snapshot > package of main. > > Some observations. While gaming I started run > https://github.com/Umio-Yasuno/amdgpu_top on the 2nd monitor to see if > overheating might be an issue. > > but the memory temps are at around 82 and the GPU core itself is usually > lower. > One observation is that the card is supposed to have a boost clock of 3100MHz > but amdgpu_top sees it boost over 3200. I tried both onboard bios and the > behavior is the same. > > currently I run both my wayland session as well as my game with > RADV_DEBUG=nohiz but that didnt provide more details adding nodcc drop the > performance from 480-500Hz ( the card could go faster but I limit the game to > 500) > to 330-360. > > Please let me know, if I can provide more details > > darix > > > ``` > -- > Always remember: > Never accept the world as it appears to be. > Dare to see it for what it could be. > The world can always use more heroes. > > > > > ``` >
-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette