Disabling CONFIG_HSA_AMD_SVM solved my problem. Now Radeon RX 6800 works fine on my ARM server (Hardware name: Think-Force Technology Universal Server/7140 Advanced, BIOS 1.1.7 20230216). Thank you so much.
------------------------------------------------------------------ Sender:Lang Yu <lang...@amd.com> Sent At:2023 Jun. 9 (Fri.) 16:37 Recipient:彭逸豪 <pengyi...@linzhuotech.com> Cc:alexander.deucher <alexander.deuc...@amd.com>; amd-gfx <amd-gfx@lists.freedesktop.org> Subject:Re: Radeon RX 6800 does not work properly on Think-Force 7140 ARM server (generates Oops and causes system deadlock) Try to disable CONFIG_HSA_AMD_SVM in your kernel config. Regards, Lang On 06/09/ , 彭逸豪 wrote: > I have a Radeon RX 6800 and want to use it on my ARM server (Hardware name: > Think-Force Technology Universal Server/7140 Advanced, BIOS 1.1.7 20230216). > However the presence of this GPU can cause kernel Oops or panic (some older > versions). Even if the kernel does not panic, the system will fall into a > "deadlock" state and cannot log in normally. > > Below is the Oops of 6.4.0-rc5 (the full log from the serial port is > attached). After that, the GPU cannot be used normally, and the system is > stuck in a "deadlock" state, and cannot log in normally after entering the > user name. If the GPU is removed or replaced with another GPU such as an > older Radeon RX 560, the system can log in normally. Radeon RX 560 works fine > in 6.4.0-rc5. > > I have tried multiple versions of the kernel, from 5.15 to 6.4.0-rc5, they > all have similar Oops or panic, and the GPU cannot be used, and the system > cannot be logged in normally. > > The attachment contains the full kernel log captured from the serial port, > and my 6.4-rc5 config file. Please let me know if additional information is > needed. > > (Note: The previous email did not have the correct subject, so I retracted > it. I am sorry if you have received duplicate emails.) > > [ 6.535108] cma: cma_alloc: reserved: alloc failed, req-size: 2 pages, > ret: -12 > [ 9.824070] Unable to handle kernel paging request at virtual address > ffffffffffe00034 > [ 9.831955] Mem abort info: > [ 9.834737] ESR = 0x0000000096000046 > [ 9.838469] EC = 0x25: DABT (current EL), IL = 32 bits > [ 9.843756] SET = 0, FnV = 0 > [ 9.846794] EA = 0, S1PTW = 0 > [ 9.849919] FSC = 0x06: level 2 translation fault > [ 9.854773] Data abort info: > [ 9.857638] ISV = 0, ISS = 0x00000046 > [ 9.861454] CM = 0, WnR = 1 > [ 9.864405] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001ff5489000 > [ 9.871074] [ffffffffffe00034] pgd=10000001b078b003, p4d=10000001b078b003, > pud=10000001b078a003, pmd=0000000000000000 > [ 9.881637] Internal error: Oops: 0000000096000046 [#1] SMP > [ 9.887180] Modules linked in: input_leds hid_generic amdgpu(+) usbhid hid > cdc_ether usbnet snd_hda_codec_hdmi binfmt_misc snd_hda_intel > snd_intel_dspcfg snd_hda_codec snd_hda_core gpu_sched drm_buddy video > snd_hwdep drm_suballoc_helper drm_ttm_helper snd_pcm ttm onboard_usb_hub > nls_iso8859_1 drm_display_helper snd_seq_midi snd_seq_midi_event ast > snd_rawmidi cec rc_core snd_seq drm_shmem_helper drm_kms_helper > snd_seq_device snd_timer snd ipmi_ssif ipmi_devintf syscopyarea crct10dif_ce > sysfillrect ipmi_msghandler soundcore arm_spe_pmu sysimgblt sch_fq_codel drm > pstore_blk ramoops reed_solomon pstore_zone efi_pstore ip_tables x_tables > autofs4 nvme igb nvme_core nvme_common i2c_algo_bit xhci_plat_hcd > [ 9.948668] CPU: 0 PID: 305 Comm: kworker/0:2 Tainted: G W > 6.4.0-rc5 #1 > [ 9.956630] Hardware name: Think-Force Technology Universal Server/7140 > Advanced, BIOS 1.1.7 20230216 > [ 9.965801] Workqueue: events work_for_cpu_fn > [ 9.970137] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 9.977060] pc : __init_zone_device_page > (/home/ubuntu/kernel-6.4-rc5/./include/linux/atomic/atomic-instrumented.h:42 > /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:99 > /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:115 > /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:557 > /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:966) > [ 9.981826] lr : memmap_init_zone_device > (/home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:1084) > [ 9.986677] sp : ffff80000aea3980 > [ 9.989971] x29: ffff80000aea3980 x28: 0000000000000000 x27: > 0000000fffff8000 > [ 9.997068] x26: ffff80000a8c5f70 x25: ffff0001c118d6a0 x24: > fffffc0000000000 > [ 10.004165] x23: 0000001000000000 x22: ffff800009bc5e98 x21: > 0000000000000001 > [ 10.011262] x20: 0000000000000001 x19: ffffffffffe00000 x18: > 0000000000000000 > [ 10.018360] x17: 0000000000000000 x16: 0000000000000000 x15: > 0000000000000000 > [ 10.025457] x14: 0000000000000000 x13: 0000000000000000 x12: > 0000000000000000 > [ 10.032554] x11: 0000000000000000 x10: 0000000000000000 x9 : > ffff8000094032cc > [ 10.039651] x8 : 0000000000000000 x7 : 00000000ffffffff x6 : > 0000000000000001 > [ 10.046748] x5 : 0000000000000000 x4 : ffff0001c118d6a0 x3 : > 0000000000000000 > [ 10.053845] x2 : 0200000000000000 x1 : 0000000fffff8000 x0 : > ffffffffffe00000 > [ 10.060943] Call trace: > [ 10.063373] __init_zone_device_page > (/home/ubuntu/kernel-6.4-rc5/./include/linux/atomic/atomic-instrumented.h:42 > /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:99 > /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:115 > /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:557 > /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:966) > [ 10.067791] memmap_init_zone_device > (/home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:1084) > [ 10.072297] memremap_pages (/home/ubuntu/kernel-6.4-rc5/mm/memremap.c:270 > /home/ubuntu/kernel-6.4-rc5/mm/memremap.c:366) > [ 10.076111] devm_memremap_pages > (/home/ubuntu/kernel-6.4-rc5/mm/memremap.c:407) > [ 10.080183] svm_migrate_init > (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:1029) > amdgpu > [ 10.085112] kgd2kfd_device_init > (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:647) > amdgpu > [ 10.090318] amdgpu_amdkfd_device_init > (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:197) > amdgpu > [ 10.096039] amdgpu_device_init > (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2537 > /home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3871) > amdgpu > [ 10.101329] amdgpu_driver_load_kms > (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) > amdgpu > [ 10.106704] amdgpu_pci_probe > (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2149) > amdgpu > [ 10.111648] local_pci_probe > (/home/ubuntu/kernel-6.4-rc5/drivers/pci/pci-driver.c:325) > [ 10.115462] work_for_cpu_fn > (/home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:5370) > [ 10.119190] process_one_work > (/home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2410) > [ 10.123177] worker_thread > (/home/ubuntu/kernel-6.4-rc5/./include/linux/list.h:292 > /home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2465 > /home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2557) > [ 10.126905] kthread (/home/ubuntu/kernel-6.4-rc5/kernel/kthread.c:379) > [ 10.130114] ret_from_fork > (/home/ubuntu/kernel-6.4-rc5/arch/arm64/kernel/entry.S:871) > [ 10.133670] Code: 910003fd a90153f3 12800007 d3490842 (b9003406) > All code > ======== > 0: 910003fd mov x29, sp > 4: a90153f3 stp x19, x20, [sp, #16] > 8: 12800007 mov w7, #0xffffffff // #-1 > c: d3490842 ubfiz x2, x2, #55, #3 > 10:* b9003406 str w6, [x0, #52] <-- trapping instruction > > Code starting with the faulting instruction > =========================================== > 0: b9003406 str w6, [x0, #52] > [ 10.139730] ---[ end trace 0000000000000000 ]---