Hello,
I have a PC which worked fine for many years that I did not use for half
a year. Yesterday I want to use it, but sway appears to be crashing
amdgpu in DRM. The components are:
- ASUS System Product Name/TUF GAMING B650M-PLUS
- AMD Ryzen 9 7950X 16-Core Processor
- Debian trixie
I already tried the following:
- Upgrading to Debian forky
- Debian trixie live cd
- Installing the latested amd gpu firmware
- Updating the Bios to the latest.
In order to reproduce the issue, I boot linux, start sway and open an
alacritty terminal with a tmux inside. amdgpu crashes immediatly. Find
here a video and the full dmesg.
https://tg.st/u/dmesg_9f62587406fb808dc4d91d41029ccf88ceeadf13e1f91d65c27b57536f375550.txt
https://tg.st/u/amdgpu_device_coredump_data_a25f2060c56260bb46ac95ee3123969d5127bf31b29ea3adfe3feeac67bf4edc.zst
https://tg.st/u/VID_20251222_071051104.mp4
[ 57.342777] amdgpu 0000:0b:00.0: amdgpu: Dumping IP State
[ 57.343822] amdgpu 0000:0b:00.0: amdgpu: Dumping IP State Completed
[ 57.343869] amdgpu 0000:0b:00.0: amdgpu: [drm] AMDGPU device coredump file
has been created
[ 57.343871] amdgpu 0000:0b:00.0: amdgpu: [drm] Check your
/sys/class/drm/card0/device/devcoredump/data
[ 57.343872] amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled
seq=106, emitted seq=108
[ 57.343873] amdgpu 0000:0b:00.0: amdgpu: Process sway pid 2021 thread
sway:cs0 pid 2317
[ 57.343875] amdgpu 0000:0b:00.0: amdgpu: Starting gfx_0.0.0 ring reset
[ 57.485168] amdgpu 0000:0b:00.0: amdgpu: Ring gfx_0.0.0 reset failed
[ 57.485170] amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
[ 57.609921] amdgpu 0000:0b:00.0: amdgpu: MODE2 reset
[ 57.616920] amdgpu 0000:0b:00.0: amdgpu: GPU reset succeeded, trying to
resume
[ 57.617008] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
[ 57.617024] amdgpu 0000:0b:00.0: amdgpu: PSP is resuming...
[ 57.638326] amdgpu 0000:0b:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000
for PSP TMR
[ 57.832236] amdgpu 0000:0b:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[ 57.837959] amdgpu 0000:0b:00.0: amdgpu: RAP: optional rap ta ucode is not
available
[ 57.837961] amdgpu 0000:0b:00.0: amdgpu: SECUREDISPLAY: optional
securedisplay ta ucode is not available
[ 57.837963] amdgpu 0000:0b:00.0: amdgpu: SMU is resuming...
[ 57.838869] amdgpu 0000:0b:00.0: amdgpu: SMU is resumed successfully!
[ 57.839132] amdgpu 0000:0b:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
[ 57.842333] amdgpu 0000:0b:00.0: amdgpu: [drm] DMUB hardware initialized:
version=0x05002C00
[ 57.944932] amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[ 57.944935] amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on
hub 0
[ 57.944936] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4
on hub 0
[ 57.944937] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5
on hub 0
[ 57.944938] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6
on hub 0
[ 57.944938] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7
on hub 0
[ 57.944939] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8
on hub 0
[ 57.944939] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9
on hub 0
[ 57.944940] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10
on hub 0
[ 57.944940] amdgpu 0000:0b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11
on hub 0
[ 57.944941] amdgpu 0000:0b:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12
on hub 0
[ 57.944941] amdgpu 0000:0b:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on
hub 0
[ 57.944942] amdgpu 0000:0b:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on
hub 8
[ 57.944943] amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
on hub 8
[ 57.944943] amdgpu 0000:0b:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
on hub 8
[ 57.944944] amdgpu 0000:0b:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on
hub 8
[ 57.948092] amdgpu 0000:0b:00.0: amdgpu: GPU reset(1) succeeded!
[ 57.948107] amdgpu 0000:0b:00.0: [drm] device wedged, but recovered through
reset
[ 57.961832] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
I'm grateful for any pointers that resolve the issue and available for
debugging.
Cheers,
Thomas