You have been subscribed to a public bug: Since the last week 6.8 kernel update, the amdgpu driver regularly crashes after a while, at least in screen saver.
No recovery, so black screen afterward and mandatory reboot. # lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.5 LTS Release: 22.04 Codename: jammy # uname -a Linux server 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux # lshw -class video *-display description: VGA compatible controller product: Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] vendor: Advanced Micro Devices, Inc. [AMD/ATI] physical id: 0 bus info: pci@0000:06:00.0 logical name: /dev/fb0 version: c7 width: 64 bits clock: 33MHz capabilities: pm pciexpress msi vga_controller bus_master cap_list rom fb configuration: depth=32 driver=amdgpu latency=0 resolution=1920,1080 resources: irq:57 memory:d0000000-dfffffff memory:e0000000-e01fffff ioport:e000(size=256) memory:fcf00000-fcf3ffff memory:c0000-dffff # lshw -class processor *-cpu description: CPU product: AMD Ryzen 7 3700X 8-Core Processor vendor: Advanced Micro Devices [AMD] physical id: 11 bus info: cpu@0 version: 23.113.0 serial: Unknown slot: AM4 size: 4332MHz capacity: 4426MHz width: 64 bits clock: 100MHz capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp x86-64 constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es cpufreq configuration: cores=8 enabledcores=8 microcode=141561889 threads=16 Relevant part of the journal: Sep 18 06:55:28 server kernel: [309106.875597] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=689970, emitted seq=689972 Sep 18 06:55:28 server kernel: [309106.876158] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xwayland pid 4403 thread Xwayland:cs0 pid 4408 Sep 18 06:55:28 server kernel: [309106.876702] amdgpu 0000:06:00.0: amdgpu: GPU reset begin! Sep 18 06:55:28 server kernel: [309106.876795] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876795] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876803] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876803] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876810] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876810] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876817] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876817] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876824] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876824] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876831] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876831] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876837] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876837] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876845] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876845] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876852] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876852] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876859] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876859] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876866] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876866] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876872] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876872] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876879] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876879] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876886] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876886] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876893] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876893] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876900] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876900] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.876907] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.876907] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.925107] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vce_v3_0> failed -110 Sep 18 06:55:28 server kernel: [309106.931467] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931467] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931477] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931477] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931482] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931482] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931486] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931486] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931492] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931492] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931497] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931497] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931502] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931502] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931508] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931508] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931512] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931512] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309106.931516] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309106.931516] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315714] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315714] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315723] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315723] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315729] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315729] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315735] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315735] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315740] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315740] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315744] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315744] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315749] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315749] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315760] amdgpu 0000:06:00.0: amdgpu: Sep 18 06:55:28 server kernel: [309107.315760] last message was failed ret is 65535 Sep 18 06:55:28 server kernel: [309107.315764] amdgpu: Failed to force to switch arbf0! Sep 18 06:55:28 server kernel: [309107.315766] amdgpu: [disable_dpm_tasks] Failed to disable DPM! Sep 18 06:55:28 server kernel: [309107.315768] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -22 Sep 18 06:55:29 server kernel: [309107.508098] amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110) Sep 18 06:55:29 server kernel: [309107.508324] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed Sep 18 06:55:29 server kernel: [309107.892705] amdgpu: cp is busy, skip halt cp Sep 18 06:55:29 server kernel: [309108.084690] amdgpu: rlc is busy, skip halt rlc Sep 18 06:55:29 server kernel: [309108.277758] amdgpu 0000:06:00.0: amdgpu: BACO reset Sep 18 06:55:33 server gnome-shell[4403]: amdgpu: The CS has been rejected (-125), but the context isn't robust. Sep 18 06:55:33 server kernel: [309111.852923] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! Sep 18 06:55:33 server gnome-shell[4403]: amdgpu: The process will be terminated. Sep 18 06:55:38 server gnome-shell[4073]: Connection to xwayland lost Sep 18 06:55:38 server pulseaudio[3903]: X11 I/O error handler called Sep 18 06:55:38 server pulseaudio[3903]: X11 I/O error exit handler called, preparing to tear down X11 modules Sep 18 06:55:38 server net.kvirc.KVIrc5.desktop[6208]: The X11 connection broke (error 1). Did the X11 server die? Sep 18 06:56:12 server kernel: [309151.148572] watchdog: BUG: soft lockup - CPU#10 stuck for 26s! [kworker/u64:0:117461] Sep 18 06:56:12 server kernel: [309151.148580] Modules linked in: vhost_net vhost vhost_iotlb tap xt_CHECKSUM xt_MASQUERADE nft_chain_nat nf_nat ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_multiport nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink binfmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel joydev snd_intel_dspcfg input_leds snd_intel_sdw_acpi snd_hda_codec snd_hda_core intel_rapl_msr snd_hwdep intel_rapl_common snd_pcm snd_seq_midi snd_seq_midi_event edac_mce_amd snd_rawmidi snd_seq snd_seq_device kvm_amd snd_timer kvm snd soundcore irqbypass ccp k10temp rapl gigabyte_wmi wmi_bmof mac_hid sch_fq_codel iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables msr parport_pc ppdev lp parport efi_pstore ip_tables x_tables autofs4 dm_crypt hid_logitech_hidpp hid_logitech_d j amdgpu hid_generic amdxcp usbhid drm_exec gpu_sched hid drm_buddy Sep 18 06:56:12 server kernel: [309151.148675] i2c_algo_bit drm_suballoc_helper drm_ttm_helper ttm crct10dif_pclmul drm_display_helper crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 cec i2c_piix4 rc_core r8169 ahci video realtek xhci_pci libahci xhci_pci_renesas wmi gpio_amdpt aesni_intel crypto_simd cryptd Sep 18 06:56:12 server kernel: [309151.148705] CPU: 10 PID: 117461 Comm: kworker/u64:0 Not tainted 6.8.0-40-generic #40~22.04.3-Ubuntu Sep 18 06:56:12 server kernel: [309151.148708] Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F63b 05/11/2022 Sep 18 06:56:12 server kernel: [309151.148710] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] Sep 18 06:56:12 server kernel: [309151.148720] RIP: 0010:amdgpu_device_rreg+0xf3/0x120 [amdgpu] Sep 18 06:56:12 server kernel: [309151.148976] Code: 75 1d f6 83 e8 13 04 00 10 74 14 48 8b 83 d8 2f 04 00 48 8d 78 18 e8 1c 42 bf e0 85 c0 75 10 4c 03 a3 e0 08 00 00 45 8b 24 24 <e9> 63 ff ff ff 48 89 df 31 d2 44 89 f6 e8 9b b6 13 00 41 89 c4 48 Sep 18 06:56:12 server kernel: [309151.148978] RSP: 0018:ffffb0b51257f9b8 EFLAGS: 00000282 Sep 18 06:56:12 server kernel: [309151.148981] RAX: ffffffffc05b3f70 RBX: ffff9ac42c700000 RCX: 000000000000bfcc Sep 18 06:56:12 server kernel: [309151.148983] RDX: 0000000000000000 RSI: 0000000000007670 RDI: ffff9ac42c700000 Sep 18 06:56:12 server kernel: [309151.148984] RBP: ffffb0b51257f9e0 R08: ffff9ac402022600 R09: ffff9ac426fd0000 Sep 18 06:56:12 server kernel: [309151.148986] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffff Sep 18 06:56:12 server kernel: [309151.148988] R13: 0000000000000000 R14: 0000000000001d9c R15: 0000000000000006 Sep 18 06:56:12 server kernel: [309151.148989] FS: 0000000000000000(0000) GS:ffff9acb1e500000(0000) knlGS:0000000000000000 Sep 18 06:56:12 server kernel: [309151.148991] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 18 06:56:12 server kernel: [309151.148993] CR2: 00005aae05e39cd0 CR3: 0000000038a3c000 CR4: 0000000000350ef0 Sep 18 06:56:12 server kernel: [309151.148995] Call Trace: Sep 18 06:56:12 server kernel: [309151.148998] <IRQ> Sep 18 06:56:12 server kernel: [309151.149001] ? show_regs+0x6d/0x80 Sep 18 06:56:12 server kernel: [309151.149006] ? watchdog_timer_fn+0x206/0x290 Sep 18 06:56:12 server kernel: [309151.149010] ? __pfx_watchdog_timer_fn+0x10/0x10 Sep 18 06:56:12 server kernel: [309151.149014] ? __hrtimer_run_queues+0x112/0x2a0 Sep 18 06:56:12 server kernel: [309151.149017] ? srso_return_thunk+0x5/0x5f Sep 18 06:56:12 server kernel: [309151.149023] ? hrtimer_interrupt+0xf6/0x250 Sep 18 06:56:12 server kernel: [309151.149028] ? __sysvec_apic_timer_interrupt+0x51/0x150 Sep 18 06:56:12 server kernel: [309151.149032] ? sysvec_apic_timer_interrupt+0x8d/0xd0 Sep 18 06:56:12 server kernel: [309151.149036] </IRQ> Sep 18 06:56:12 server kernel: [309151.149037] <TASK> Sep 18 06:56:12 server kernel: [309151.149039] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 Sep 18 06:56:12 server kernel: [309151.149044] ? __pfx_cail_reg_read+0x10/0x10 [amdgpu] Sep 18 06:56:12 server kernel: [309151.149270] ? amdgpu_device_rreg+0xf3/0x120 [amdgpu] Sep 18 06:56:12 server kernel: [309151.149493] ? srso_return_thunk+0x5/0x5f Sep 18 06:56:12 server kernel: [309151.149497] cail_reg_read+0x17/0x30 [amdgpu] Sep 18 06:56:12 server kernel: [309151.149721] atom_get_src_int+0x64b/0x6f0 [amdgpu] Sep 18 06:56:12 server kernel: [309151.149948] atom_op_test+0x76/0x1b0 [amdgpu] Sep 18 06:56:12 server kernel: [309151.150174] amdgpu_atom_execute_table_locked+0x174/0x3d0 [amdgpu] Sep 18 06:56:12 server kernel: [309151.150401] atom_op_calltable+0xd3/0x160 [amdgpu] Sep 18 06:56:12 server kernel: [309151.150627] amdgpu_atom_execute_table_locked+0x174/0x3d0 [amdgpu] Sep 18 06:56:12 server kernel: [309151.150854] amdgpu_atom_asic_init+0x133/0x180 [amdgpu] Sep 18 06:56:12 server kernel: [309151.151081] amdgpu_device_asic_init+0x74/0x90 [amdgpu] Sep 18 06:56:12 server kernel: [309151.151306] amdgpu_do_asic_reset.part.0+0x12f/0x4e0 [amdgpu] Sep 18 06:56:12 server kernel: [309151.151529] amdgpu_do_asic_reset+0xac/0x110 [amdgpu] Sep 18 06:56:12 server kernel: [309151.151752] amdgpu_device_gpu_recover+0x4ab/0x930 [amdgpu] Sep 18 06:56:12 server kernel: [309151.151976] amdgpu_job_timedout+0x182/0x270 [amdgpu] Sep 18 06:56:12 server kernel: [309151.152254] drm_sched_job_timedout+0x70/0x110 [gpu_sched] Sep 18 06:56:12 server kernel: [309151.152261] process_one_work+0x16f/0x350 Sep 18 06:56:12 server kernel: [309151.152266] worker_thread+0x306/0x440 Sep 18 06:56:12 server kernel: [309151.152271] ? __pfx_worker_thread+0x10/0x10 Sep 18 06:56:12 server kernel: [309151.152274] kthread+0xf2/0x120 Sep 18 06:56:12 server kernel: [309151.152277] ? __pfx_kthread+0x10/0x10 Sep 18 06:56:12 server kernel: [309151.152280] ret_from_fork+0x47/0x70 Sep 18 06:56:12 server kernel: [309151.152283] ? __pfx_kthread+0x10/0x10 Sep 18 06:56:12 server kernel: [309151.152286] ret_from_fork_asm+0x1b/0x30 Sep 18 06:56:12 server kernel: [309151.152293] </TASK> Sep 18 06:56:25 server kernel: [309163.570987] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting Sep 18 06:56:25 server kernel: [309163.571268] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing A9BC (len 158, WS 0, PS 8) @ 0xA9D3 Sep 18 06:56:25 server kernel: [309163.571497] amdgpu 0000:06:00.0: amdgpu: asic atom init failed! Sep 18 06:56:25 server kernel: [309163.571518] amdgpu 0000:06:00.0: amdgpu: GPU reset(2) failed Sep 18 06:56:25 server kernel: [309163.571591] snd_hda_intel 0000:06:00.1: Unable to change power state from D3hot to D0, device inaccessible Sep 18 06:56:25 server kernel: [309163.736749] snd_hda_intel 0000:06:00.1: CORB reset timeout#2, CORBRP = 65535 Sep 18 06:56:25 server kernel: [309163.736763] amdgpu 0000:06:00.0: amdgpu: GPU reset end with ret = -22 Sep 18 06:56:25 server kernel: [309163.736765] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -22 Sep 18 06:56:35 server kernel: [309173.935423] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=26658, emitted seq=26660 Sep 18 06:56:35 server kernel: [309173.935982] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 Sep 18 06:56:35 server kernel: [309173.936526] amdgpu 0000:06:00.0: amdgpu: GPU reset begin! [...] Sep 18 07:00:25 server kernel: [309403.823763] ret_from_fork+0x47/0x70 Sep 18 07:00:25 server kernel: [309403.823768] ? __pfx_kthread+0x10/0x10 Sep 18 07:00:25 server kernel: [309403.823773] ret_from_fork_asm+0x1b/0x30 Sep 18 07:00:25 server kernel: [309403.823785] </TASK> Sep 18 07:00:25 server kernel: [309403.823787] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- black screen after amdgpu crash (ring gfx timeout) https://bugs.launchpad.net/bugs/2081092 You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp