*** This bug is a duplicate of bug 1956845 *** https://bugs.launchpad.net/bugs/1956845
Hardware: DP Epyc Milan 7763 node with 2 qty AMD Instinct Mi100 Kernel: ubuntu 18.04.6LTS w/linux-hwe 5.4.0-107-generic ROCm 5.1.0 and AMDGPU version: 5.13.20.5.1 driver Homegrown software developed using ROCm 5.1.0. Might this be related? Logs: [304726.475355] beegfs: enabling unsafe global rkey [304734.912424] amdgpu 0000:23:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769, for process hyprep pid 122284 thread hyprep pid 122284) [304734.928526] amdgpu 0000:23:00.0: amdgpu: in page starting at address 0x0000000001753000 from IH client 0x1b (UTCL2) [304734.939972] amdgpu 0000:23:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [304734.948130] amdgpu 0000:23:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [304734.955858] amdgpu 0000:23:00.0: amdgpu: MORE_FAULTS: 0x1 [304734.962115] amdgpu 0000:23:00.0: amdgpu: WALKER_ERROR: 0x0 [304734.968441] amdgpu 0000:23:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [304734.975196] amdgpu 0000:23:00.0: amdgpu: MAPPING_ERROR: 0x0 [304734.981580] amdgpu 0000:23:00.0: amdgpu: RW: 0x0 [304735.568400] amdgpu 0000:23:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32769, for process pid 0 thread pid 0) [304735.582318] amdgpu 0000:23:00.0: amdgpu: in page starting at address 0x0000000001753000 from IH client 0x1b (UTCL2) [304735.593722] amdgpu 0000:23:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [304735.601851] amdgpu 0000:23:00.0: amdgpu: Faulty UTCL2 client ID: CB (0x0) [304735.609465] amdgpu 0000:23:00.0: amdgpu: MORE_FAULTS: 0x0 [304735.615686] amdgpu 0000:23:00.0: amdgpu: WALKER_ERROR: 0x0 [304735.621994] amdgpu 0000:23:00.0: amdgpu: PERMISSION_FAULTS: 0x0 [304735.628737] amdgpu 0000:23:00.0: amdgpu: MAPPING_ERROR: 0x0 [304735.635104] amdgpu 0000:23:00.0: amdgpu: RW: 0x0 [321839.599489] beegfs: enabling unsafe global rkey Driver Apr 02 22:19:59 n004 kernel: [drm] amdgpu kernel modesetting enabled. Apr 02 22:19:59 n004 kernel: [drm] amdgpu version: 5.13.20.5.1 Apr 02 22:19:59 n004 kernel: amdgpu: Ignoring ACPI CRAT on non-APU system Apr 02 22:19:59 n004 kernel: amdgpu: Virtual CRAT table created for CPU Apr 02 22:19:59 n004 kernel: amdgpu: Topology: Add CPU node Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: remove_conflicting_pci_framebuffers: bar 0: 0x67800000000 -> 0x67fffffffff Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x68000000000 -> 0x680001fffff Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xeb400000 -> 0xeb47ffff Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: enabling device (0000 -> 0003) Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: Fetched VBIOS from ROM BAR Apr 02 22:19:59 n004 kernel: amdgpu: ATOM BIOS: 113-D3431401-100 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: MEM ECC is active. Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: SRAM ECC is active. Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: RAS INFO: ras initialized successfully, hardware ability[7fff] ras_mask[7fff] Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: VRAM: 32752M 0x0000008000000000 - 0x00000087FEFFFFFF (32752M used) Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: AGP: 267878400M 0x0000008800000000 - 0x0000FFFFFFFFFFFF Apr 02 22:19:59 n004 kernel: [drm] amdgpu: 32752M of VRAM memory ready Apr 02 22:19:59 n004 kernel: [drm] amdgpu: 2064153M of GTT memory ready. Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: PSP runtime database doesn't exist Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: Will use PSP to load VCN firmware Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: DTM: optional dtm ta ucode is not available Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: RAP: optional rap ta ucode is not available Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: use vbios provided pptable Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.6 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: PMFW based fan control disabled Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: SMU is initialized successfully! Apr 02 22:19:59 n004 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart Apr 02 22:19:59 n004 kernel: amdgpu: Virtual CRAT table created for GPU Apr 02 22:19:59 n004 kernel: amdgpu: Topology: Add dGPU node [0x738c:0x1002] Apr 02 22:19:59 n004 kernel: kfd kfd: amdgpu: added device 1002:738c Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: SE 8, SH per SE 1, CU per SH 16, active_cu_number 120 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 0 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 1 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 4 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 5 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 6 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 7 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 8 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 9 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 10 on hub 0 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma1 uses VM inv eng 1 on hub 1 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma2 uses VM inv eng 4 on hub 1 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma3 uses VM inv eng 5 on hub 1 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma4 uses VM inv eng 6 on hub 1 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma5 uses VM inv eng 0 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma6 uses VM inv eng 1 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring sdma7 uses VM inv eng 4 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 5 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 6 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 7 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 8 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 9 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 10 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring jpeg_dec_0 uses VM inv eng 11 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu 0000:23:00.0: amdgpu: ring jpeg_dec_1 uses VM inv eng 12 on hub 2 Apr 02 22:19:59 n004 kernel: amdgpu: Detected AMDGPU 6 Perf Events. Apr 02 22:19:59 n004 kernel: [drm] Initialized amdgpu 3.45.0 20150101 for 0000:23:00.0 on minor 1 Apr 02 22:21:38 n004 kernel: amdgpu: PeerDirect support was initialized successfully -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1940690 Title: amdgpu kernel crash Status in linux package in Ubuntu: Confirmed Bug description: [ 0.000000] Linux version 5.4.0-81-generic (buildd@lgw01-amd64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021 (Ubuntu 5.4.0-81.91-generic 5.4.128) [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.4.0-81-generic root=UUID=2fac2ccc-b353-4ced-a8e5-7e5a7f0fe5f3 ro [ 217.837643] amdgpu 0000:09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32772, for process Xorg pid 7499 thread Xorg:cs0 pid 7500) [ 217.837647] amdgpu 0000:09:00.0: in page starting at address 0x000080010797e000 from client 27 [ 217.837649] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [ 217.837651] amdgpu 0000:09:00.0: MORE_FAULTS: 0x1 [ 217.837653] amdgpu 0000:09:00.0: WALKER_ERROR: 0x0 [ 217.837655] amdgpu 0000:09:00.0: PERMISSION_FAULTS: 0x3 [ 217.837657] amdgpu 0000:09:00.0: MAPPING_ERROR: 0x0 [ 217.837658] amdgpu 0000:09:00.0: RW: 0x0 [ 217.837668] amdgpu 0000:09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32772, for process Xorg pid 7499 thread Xorg:cs0 pid 7500) [ 217.837670] amdgpu 0000:09:00.0: in page starting at address 0x00008001079a6000 from client 27 [ 217.837672] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [ 217.837674] amdgpu 0000:09:00.0: MORE_FAULTS: 0x1 [ 217.837675] amdgpu 0000:09:00.0: WALKER_ERROR: 0x0 [ 217.837677] amdgpu 0000:09:00.0: PERMISSION_FAULTS: 0x3 [ 217.837679] amdgpu 0000:09:00.0: MAPPING_ERROR: 0x0 [ 217.837681] amdgpu 0000:09:00.0: RW: 0x0 [ 217.837698] amdgpu 0000:09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32772, for process Xorg pid 7499 thread Xorg:cs0 pid 7500) [ 217.837703] amdgpu 0000:09:00.0: in page starting at address 0x00008001079ce000 from client 27 [ 217.837708] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [ 217.837712] amdgpu 0000:09:00.0: MORE_FAULTS: 0x1 [ 217.837716] amdgpu 0000:09:00.0: WALKER_ERROR: 0x0 [ 217.837721] amdgpu 0000:09:00.0: PERMISSION_FAULTS: 0x3 [ 217.837725] amdgpu 0000:09:00.0: MAPPING_ERROR: 0x0 [ 217.837729] amdgpu 0000:09:00.0: RW: 0x0 [ 217.838669] amdgpu 0000:09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32772, for process Xorg pid 7499 thread Xorg:cs0 pid 7500) [ 217.838673] amdgpu 0000:09:00.0: in page starting at address 0x000080010797e000 from client 27 [ 217.838675] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [ 217.838677] amdgpu 0000:09:00.0: MORE_FAULTS: 0x1 [ 217.838679] amdgpu 0000:09:00.0: WALKER_ERROR: 0x0 [ 217.838681] amdgpu 0000:09:00.0: PERMISSION_FAULTS: 0x3 [ 217.838683] amdgpu 0000:09:00.0: MAPPING_ERROR: 0x0 [ 217.838684] amdgpu 0000:09:00.0: RW: 0x0 [ 217.838695] amdgpu 0000:09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32772, for process Xorg pid 7499 thread Xorg:cs0 pid 7500) [ 217.838697] amdgpu 0000:09:00.0: in page starting at address 0x00008001079a6000 from client 27 [ 217.838699] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [ 217.838701] amdgpu 0000:09:00.0: MORE_FAULTS: 0x1 [ 217.838703] amdgpu 0000:09:00.0: WALKER_ERROR: 0x0 [ 217.838704] amdgpu 0000:09:00.0: PERMISSION_FAULTS: 0x3 [ 217.838706] amdgpu 0000:09:00.0: MAPPING_ERROR: 0x0 [ 217.838708] amdgpu 0000:09:00.0: RW: 0x0 [ 217.838727] amdgpu 0000:09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32772, for process Xorg pid 7499 thread Xorg:cs0 pid 7500) [ 217.838732] amdgpu 0000:09:00.0: in page starting at address 0x00008001079ce000 from client 27 [ 217.838736] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [ 217.838741] amdgpu 0000:09:00.0: MORE_FAULTS: 0x1 [ 217.838746] amdgpu 0000:09:00.0: WALKER_ERROR: 0x0 [ 217.838750] amdgpu 0000:09:00.0: PERMISSION_FAULTS: 0x3 [ 217.838755] amdgpu 0000:09:00.0: MAPPING_ERROR: 0x0 [ 217.838759] amdgpu 0000:09:00.0: RW: 0x0 [ 217.839694] amdgpu 0000:09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32772, for process Xorg pid 7499 thread Xorg:cs0 pid 7500) [ 217.839698] amdgpu 0000:09:00.0: in page starting at address 0x000080010797e000 from client 27 [ 217.839700] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031 [ 217.839702] amdgpu 0000:09:00.0: MORE_FAULTS: 0x1 [ 217.839703] amdgpu 0000:09:00.0: WALKER_ERROR: 0x0 [ 217.839705] amdgpu 0000:09:00.0: PERMISSION_FAULTS: 0x3 [ 217.839707] amdgpu 0000:09:00.0: MAPPING_ERROR: 0x0 [ 217.839709] amdgpu 0000:09:00.0: RW: 0x0 [ 217.992553] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: xorg 1:7.7+19ubuntu14 ProcVersionSignature: Ubuntu 5.4.0-80.90-generic 5.4.124 Uname: Linux 5.4.0-80-generic x86_64 ApportVersion: 2.20.11-0ubuntu27.18 Architecture: amd64 BootLog: CasperMD5CheckResult: skip CompositorRunning: None Date: Fri Aug 20 17:39:16 2021 DistUpgraded: 2020-11-08 09:01:34,443 ERROR got error from PostInstallScript ./xorg_fix_proprietary.py (g-exec-error-quark: Kindprozess »./xorg_fix_proprietary.py« konnte nicht ausgeführt werden (Datei oder Verzeichnis nicht gefunden) (8)) DistroCodename: focal DistroVariant: ubuntu ExtraDebuggingInterest: Yes GraphicsCard: Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] (rev c8) (prog-if 00 [VGA controller]) Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] InstallationDate: Installed on 2020-10-31 (293 days ago) InstallationMedia: Ubuntu 14.04.4 LTS "Trusty Tahr" - Release amd64 (20160217.1) MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M. ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-80-generic root=UUID=2fac2ccc-b353-4ced-a8e5-7e5a7f0fe5f3 ro Renderer: Software SourcePackage: xorg Symptom: display Title: Xorg crash UnitySupportTest: Error: command ['/usr/lib/nux/unity_support_test', '-p', '-f'] failed with exit code 5: Error: unable to open display UpgradeStatus: Upgraded to focal on 2020-11-08 (285 days ago) XorgConf: Section "InputClass" Identifier "middle button emulation class" MatchIsPointer "on" Option "Emulate3Buttons" "on" EndSection dmi.bios.date: 06/18/2020 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: P4.20 dmi.board.name: B450 Pro4 dmi.board.vendor: ASRock dmi.chassis.asset.tag: To Be Filled By O.E.M. dmi.chassis.type: 3 dmi.chassis.vendor: To Be Filled By O.E.M. dmi.chassis.version: To Be Filled By O.E.M. dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrP4.20:bd06/18/2020:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:rvnASRock:rnB450Pro4:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.: dmi.product.family: To Be Filled By O.E.M. dmi.product.name: To Be Filled By O.E.M. dmi.product.sku: To Be Filled By O.E.M. dmi.product.version: To Be Filled By O.E.M. dmi.sys.vendor: To Be Filled By O.E.M. mtime.conffile..etc.apport.crashdb.conf: 2021-08-13T08:27:44.287879 version.compiz: compiz 1:0.9.14.1+20.04.20200211-0ubuntu1 version.libdrm2: libdrm2 2.4.105-3~20.04.1 version.libgl1-mesa-dri: libgl1-mesa-dri 21.0.3-0ubuntu0.3~20.04.1 version.libgl1-mesa-glx: libgl1-mesa-glx 21.0.3-0ubuntu0.3~20.04.1 version.xserver-xorg-core: xserver-xorg-core 2:1.20.11-1ubuntu1~20.04.2 version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.10.6-1 version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1 version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200226-1 version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1 xserver.bootTime: Sat Nov 7 23:42:22 2020 xserver.configfile: default xserver.devices: input Power Button KEYBOARD, id 6 input Video Bus KEYBOARD, id 7 input Power Button KEYBOARD, id 8 input Dell Dell USB Keyboard KEYBOARD, id 9 input PixArt Cherry USB Optical Mouse MOUSE, id 10 xserver.errors: open /dev/dri/card0: No such file or directory open /dev/dri/card0: No such file or directory Screen 0 deleted because of no matching config section. AIGLX: reverting to software rendering xserver.logfile: /var/log/Xorg.0.log xserver.outputs: xserver.version: 2:1.19.6-1ubuntu4.7 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1940690/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp