mick crane writes:
hello, I frequently have the system freeze on me and I have to unplug it.It seems to only happen in a browser and *appears* to be triggered by using the mouse. If watching streamed youtube movie or reading blogs sometimes the screen goes black and everything is unresponsive and sometimes the screen and everything freezes but the audio keeps playing.I'd like it to stop doing that.It didn't seem to be an issue a while ago but now is happening once at least per day with bullseye and now with bookworm.I cannot find anything in logs that have looked for except
[...]
What steps can I take to isolate the problem ? mick@pumpkin:~$ inxi -SGayz System: Kernel: 5.16.0-6-amd64 arch: x86_64 bits: 64 compiler: gcc v: 11.2.0 parameters: BOOT_IMAGE=/boot/vmlinuz-5.16.0-6-amd64 root=UUID=1b68069c-ec94-4f42-a35e-6a845008eac7 ro quiet Desktop: Xfce v: 4.16.0 tk: Gtk v: 3.24.24 info: xfce4-panel wm: xfwmv: 4.16.1 vt: 7 dm: LightDM v: 1.26.0 Distro: Debian GNU/Linux bookworm/sidGraphics: Device-1: AMD Pitcairn LE GL [FirePro W5000] vendor: Dell driver: radeon v: kernel alternate: amdgpu pcie: gen: 3 speed: 8 GT/s lanes: 16 ports: active: DP-1 empty: DP-2,DVI-I-1 bus-ID: 03:00.0 chip-ID: 1002:6809 class-ID: 0300 Display: x11 server: X.Org v: 1.21.1.3 compositor: xfwm v: 4.16.1 driver: X: loaded: radeon unloaded: fbdev,modesetting,vesa gpu: radeon display-ID: :0.0 screens: 1 Screen-1: 0 s-res: 3840x2160 s-dpi: 96 s-size: 1016x571mm (40.00x22.48") s-diag: 1165mm (45.88") Monitor-1: DP-1 mapped: DisplayPort-0 model: LG (GoldStar) HDR 4K serial: <filter> built: 2021 res: 3840x2160 hz: 60 dpi: 163 gamma: 1.2 size: 600x340mm (23.62x13.39") diag: 690mm (27.2") ratio: 16:9 modes: max: 3840x2160 min: 640x480 OpenGL: renderer: AMD PITCAIRN (DRM 2.50.0 5.16.0-6-amd64 LLVM 13.0.1) v: 4.5 Mesa 21.3.8 direct render: Yes
[...] Hello,I think I had a very similar issue some months ago (Debian Bullseye). Back then I tried to switch to the proprietary AMD driver (?) and it seems to have helped although on my machine, the problem appeared at most once or twice a day back then.
These were the symptoms I had observed: * Random conditions (but always GUI application usage) * Clock in i3bar hangs * X11 mouse cursor can still move * Shortly after the hang, screen turns black * At least one program continues to run despite the graphics output being "off" * SSH connection was not possible during this screen off state.In later instances, I also observed that the screen turned black temporarily and turned on after a shorter freeze again with the system becoming usable again.
Here is my output for your inxi command: $ inxi -SGayz | cat System: Kernel: 5.10.0-13-amd64 x86_64 bits: 64 compiler: gcc v: 10.2.1 parameters: BOOT_IMAGE=/boot/vmlinuz-5.10.0-13-amd64 root=UUID=5d6c37b4-341f-4aca-a9f7-2c8a0f39336a ro quiet Desktop: i3 4.19.1-non-git info: i3bar, docker dm: startx Distro: Debian GNU/Linux 11 (bullseye) Graphics: Device-1: AMD Navi 14 [Radeon Pro W5500] vendor: Dell driver: amdgpu v: 5.11.5.21.20 bus ID: 0000:67:00.0 chip ID: 1002:7341 class ID: 0300 Display: server: X.Org 1.20.11 driver: loaded: amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa display ID: :0 screens: 1 Screen-1: 0 s-res: 7680x1440 s-dpi: 96 s-size: 2032x381mm (80.0x15.0") s-diag: 2067mm (81.4") Monitor-1: DisplayPort-0 res: 1920x1080 hz: 60 dpi: 93 size: 527x296mm (20.7x11.7") diag: 604mm (23.8") Monitor-2: DisplayPort-1 res: 2560x1440 hz: 60 dpi: 109 size: 597x336mm (23.5x13.2") diag: 685mm (27") Monitor-3: DisplayPort-2 res: 1280x1024 hz: 60 dpi: 96 size: 338x270mm (13.3x10.6") diag: 433mm (17") Monitor-4: DisplayPort-3 res: 1920x1080 hz: 60 dpi: 85 size: 575x323mm (22.6x12.7") diag: 660mm (26") OpenGL: renderer: AMD Radeon Pro W5500 v: 4.6.14739 Core Profile Context FireGL 21.20 compat-v: 4.6.14739 direct render: YesBack when the problem was still appearing, I could observe the following messages in syslog after reboot (sorry long lines...):
Sep 18 13:11:36 masysma-18 kernel: [ 2045.986736] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=3179, emitted seq=3181 Sep 18 13:11:36 masysma-18 kernel: [ 2045.986935] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 Sep 18 13:11:36 masysma-18 kernel: [ 2045.986944] amdgpu 0000:67:00.0: amdgpu: GPU reset begin! Sep 18 13:11:38 masysma-18 kernel: [ 2047.719111] amdgpu 0000:67:00.0: amdgpu: failed send message: DisallowGfxOff (42) param: 0x00000000 response 0xffffffc2 Sep 18 13:11:38 masysma-18 kernel: [ 2047.719114] amdgpu 0000:67:00.0: amdgpu: Failed to disable gfxoff! Sep 18 13:11:40 masysma-18 kernel: [ 2049.778328] amdgpu 0000:67:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state! Sep 18 13:11:41 masysma-18 kernel: [ 2051.441397] amdgpu 0000:67:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state! Sep 18 13:11:42 masysma-18 kernel: [ 2051.634059] amdgpu 0000:67:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) Sep 18 13:11:42 masysma-18 kernel: [ 2051.634111] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed Sep 18 13:11:42 masysma-18 kernel: [ 2051.813223] amdgpu 0000:67:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) Sep 18 13:11:42 masysma-18 kernel: [ 2051.813267] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed Sep 18 13:11:43 masysma-18 kernel: [ 2053.354507] amdgpu 0000:67:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state! Sep 18 13:11:43 masysma-18 kernel: [ 2053.354509] amdgpu 0000:67:00.0: amdgpu: Failed to disable smu features except BACO. Sep 18 13:11:43 masysma-18 kernel: [ 2053.354511] amdgpu 0000:67:00.0: amdgpu: Fail to disable dpm features! Sep 18 13:11:43 masysma-18 kernel: [ 2053.354570] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62 Sep 18 13:11:43 masysma-18 kernel: [ 2053.390519] [drm] free PSP TMR buffer Sep 18 13:11:43 masysma-18 kernel: [ 2053.423405] amdgpu 0000:67:00.0: amdgpu: BACO reset Sep 18 13:11:45 masysma-18 kernel: [ 2054.968108] amdgpu 0000:67:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state! Sep 18 13:11:45 masysma-18 kernel: [ 2054.968110] amdgpu 0000:67:00.0: amdgpu: Failed to enter BACO state! Sep 18 13:11:45 masysma-18 kernel: [ 2054.968112] amdgpu 0000:67:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:67:00.0 Sep 18 13:11:45 masysma-18 kernel: [ 2054.968154] amdgpu 0000:67:00.0: amdgpu: GPU reset(1) failed Sep 18 13:11:45 masysma-18 kernel: [ 2054.989004] amdgpu 0000:67:00.0: amdgpu: GPU reset end with ret = -62 Sep 18 13:11:55 masysma-18 kernel: [ 2065.186711] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=3181, emitted seq=3181 Sep 18 13:11:55 masysma-18 kernel: [ 2065.186910] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 Sep 18 13:11:55 masysma-18 kernel: [ 2065.186919] amdgpu 0000:67:00.0: amdgpu: GPU reset begin!I waded through <https://gitlab.freedesktop.org/drm/amd/-/issues/892> to gather ideas about how to fix it. Most of the "solutions" seemed to be very hacky though and I did not try them thoroghly.
This is what I noted from installing the proprietary driver: https://www.amd.com/en/support/professional-graphics/radeon-pro/radeon-pro-w5000-series/radeon-pro-w5500 ./amdgpu-pro-install rm /etc/X11/xorg.confAlso, I noted I had had a DP connectivity issue that was possibly responsible for the case where after the hang, the monitor would come back with a picture. It was fixed by re-attaching the DP cable...
HTH and YMMV Linux-Fan ΓΆΓΆ
pgpw529fpcICT.pgp
Description: PGP signature