On Mon, Dec 8, 2025 at 8:59 AM Mack Wang <[email protected]> wrote:
>
> Hi,
>
> Starting from kernel version 6.18 I'm experiencing frequent failures and
> resets of the GPU, rendering the computer nearly unusable. The screen would
> flicker, and eventually blackout (most of the cases) or recover (fewer cases).
> Even if I switch to another GPU and have Radeon GPU only for rendering, it can
> fail and eventually kill the app that is running on it. The problem isn't
> present in 6.17.
>
> My dmesg logs show something like this (a successful reset):
>
> [  585.109939] amdgpu 0000:06:00.0: amdgpu: Dumping IP State
> [  585.111758] amdgpu 0000:06:00.0: amdgpu: Dumping IP State Completed
> [  585.111839] amdgpu 0000:06:00.0: amdgpu: [drm] AMDGPU device coredump file
> has been created
> [  585.111841] amdgpu 0000:06:00.0: amdgpu: [drm] Check your
> /sys/class/drm/card2/device/devcoredump/data
> [  585.111844] amdgpu 0000:06:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled
> seq=31692, emitted seq=31694
> [  585.111847] amdgpu 0000:06:00.0: amdgpu:  Process kwin_wayland pid 114
> thread kwin_wayla:cs0 pid 514
> [  585.111849] amdgpu 0000:06:00.0: amdgpu: Starting gfx_0.1.0 ring reset
> [  585.269485] amdgpu 0000:06:00.0: amdgpu: Ring gfx_0.1.0 reset failed
> [  585.269490] amdgpu 0000:06:00.0: amdgpu: GPU reset begin!. Source:  1
> [  585.331433] amdgpu 0000:06:00.0: amdgpu: MODE2 reset
> [  585.338731] amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to
> resume
> [  585.339090] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
> [  585.339113] amdgpu 0000:06:00.0: amdgpu: PSP is resuming...
> [  585.361053] amdgpu 0000:06:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000
> for PSP TMR
> [  585.593433] amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not
> available
> [  585.602279] amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not
> available
> [  585.602281] amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: optional
> securedisplay ta ucode is not available
> [  585.602282] amdgpu 0000:06:00.0: amdgpu: SMU is resuming...
> [  585.602569] amdgpu 0000:06:00.0: amdgpu: SMU is resumed successfully!
> [  585.602750] amdgpu 0000:06:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
> [  585.607508] amdgpu 0000:06:00.0: amdgpu: [drm] DMUB hardware initialized:
> version=0x05002C00
> [  585.880737] amdgpu 0000:06:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0
> on hub 0
> [  585.880742] amdgpu 0000:06:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1
> on hub 0
> [  585.880743] amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4
> on hub 0
> [  585.880744] amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5
> on hub 0
> [  585.880745] amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6
> on hub 0
> [  585.880746] amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7
> on hub 0
> [  585.880747] amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8
> on hub 0
> [  585.880748] amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9
> on hub 0
> [  585.880749] amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10
> on hub 0
> [  585.880751] amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11
> on hub 0
> [  585.880752] amdgpu 0000:06:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng
> 12 on hub 0
> [  585.880753] amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on
> hub 0
> [  585.880754] amdgpu 0000:06:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0
> on hub 8
> [  585.880755] amdgpu 0000:06:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
> on hub 8
> [  585.880756] amdgpu 0000:06:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
> on hub 8
> [  585.880757] amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on
> hub 8
> [  585.884345] amdgpu 0000:06:00.0: amdgpu: GPU reset(1) succeeded!
> [  585.884371] amdgpu 0000:06:00.0: [drm] device wedged, but recovered through
> reset
> [  585.897300] amdgpu 0000:06:00.0: amdgpu: [drm] *ERROR* Failed to initialize
> parser -125!
>
> I'm on an ASUS laptop with Ryzen 7940HX/Radeon 610M. I'm using a distribution
> kernel, but the maintainers are slow to respond, so forgive me for sending
> messages here. I use a custom kernel command line amdgpu.dcdebugmask=0x10 to
> work around kernel lockup problems, which is a separate problem that's been
> around since ~6.12.
>
> I've collected more dmesg logs other than what's shown above, as well as
> device coredumps from /sys/class/drm/card/device/devcoredump/data. I'm also
> happy to help with bisecting the problem if it's not too large. Let me know
> how I could help.

Please file a ticket here:
https://gitlab.freedesktop.org/drm/amd/-/issues
and if you could bisect, that would be really helpful.

Thanks!

Alex

Reply via email to