Disabling CONFIG_HSA_AMD_SVM solved my problem. Now Radeon RX 6800 works fine 
on my ARM server (Hardware name: Think-Force Technology Universal Server/7140 
Advanced, BIOS 1.1.7 20230216). Thank you so much.

------------------------------------------------------------------
Sender:Lang Yu <lang...@amd.com>
Sent At:2023 Jun. 9 (Fri.) 16:37
Recipient:彭逸豪 <pengyi...@linzhuotech.com>
Cc:alexander.deucher <alexander.deuc...@amd.com>; amd-gfx 
<amd-gfx@lists.freedesktop.org>
Subject:Re: Radeon RX 6800 does not work properly on Think-Force 7140 ARM 
server (generates Oops and causes system deadlock)


Try to disable CONFIG_HSA_AMD_SVM in your kernel config.

Regards,
Lang

On 06/09/ , 彭逸豪 wrote:
> I have a Radeon RX 6800 and want to use it on my ARM server (Hardware name: 
> Think-Force Technology Universal Server/7140 Advanced, BIOS 1.1.7 20230216). 
> However the presence of this GPU can cause kernel Oops or panic (some older 
> versions). Even if the kernel does not panic, the system will fall into a 
> "deadlock" state and cannot log in normally.
> 
> Below is the Oops of 6.4.0-rc5 (the full log from the serial port is 
> attached). After that, the GPU cannot be used normally, and the system is 
> stuck in a "deadlock" state, and cannot log in normally after entering the 
> user name. If the GPU is removed or replaced with another GPU such as an 
> older Radeon RX 560, the system can log in normally. Radeon RX 560 works fine 
> in 6.4.0-rc5.
> 
> I have tried multiple versions of the kernel, from 5.15 to 6.4.0-rc5, they 
> all have similar Oops or panic, and the GPU cannot be used, and the system 
> cannot be logged in normally.
> 
> The attachment contains the full kernel log captured from the serial port, 
> and my 6.4-rc5 config file. Please let me know if additional information is 
> needed.
> 
> (Note: The previous email did not have the correct subject, so I retracted 
> it. I am sorry if you have received duplicate emails.)
> 
> [    6.535108] cma: cma_alloc: reserved: alloc failed, req-size: 2 pages, 
> ret: -12
> [    9.824070] Unable to handle kernel paging request at virtual address 
> ffffffffffe00034
> [    9.831955] Mem abort info:
> [    9.834737]   ESR = 0x0000000096000046
> [    9.838469]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    9.843756]   SET = 0, FnV = 0
> [    9.846794]   EA = 0, S1PTW = 0
> [    9.849919]   FSC = 0x06: level 2 translation fault
> [    9.854773] Data abort info:
> [    9.857638]   ISV = 0, ISS = 0x00000046
> [    9.861454]   CM = 0, WnR = 1
> [    9.864405] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000001ff5489000
> [    9.871074] [ffffffffffe00034] pgd=10000001b078b003, p4d=10000001b078b003, 
> pud=10000001b078a003, pmd=0000000000000000
> [    9.881637] Internal error: Oops: 0000000096000046 [#1] SMP
> [    9.887180] Modules linked in: input_leds hid_generic amdgpu(+) usbhid hid 
> cdc_ether usbnet snd_hda_codec_hdmi binfmt_misc snd_hda_intel 
> snd_intel_dspcfg snd_hda_codec snd_hda_core gpu_sched drm_buddy video 
> snd_hwdep drm_suballoc_helper drm_ttm_helper snd_pcm ttm onboard_usb_hub 
> nls_iso8859_1 drm_display_helper snd_seq_midi snd_seq_midi_event ast 
> snd_rawmidi cec rc_core snd_seq drm_shmem_helper drm_kms_helper 
> snd_seq_device snd_timer snd ipmi_ssif ipmi_devintf syscopyarea crct10dif_ce 
> sysfillrect ipmi_msghandler soundcore arm_spe_pmu sysimgblt sch_fq_codel drm 
> pstore_blk ramoops reed_solomon pstore_zone efi_pstore ip_tables x_tables 
> autofs4 nvme igb nvme_core nvme_common i2c_algo_bit xhci_plat_hcd
> [    9.948668] CPU: 0 PID: 305 Comm: kworker/0:2 Tainted: G        W          
> 6.4.0-rc5 #1
> [    9.956630] Hardware name: Think-Force Technology Universal Server/7140 
> Advanced, BIOS 1.1.7 20230216
> [    9.965801] Workqueue: events work_for_cpu_fn
> [    9.970137] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    9.977060] pc : __init_zone_device_page 
> (/home/ubuntu/kernel-6.4-rc5/./include/linux/atomic/atomic-instrumented.h:42 
> /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:99 
> /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:115 
> /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:557 
> /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:966) 
> [    9.981826] lr : memmap_init_zone_device 
> (/home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:1084) 
> [    9.986677] sp : ffff80000aea3980
> [    9.989971] x29: ffff80000aea3980 x28: 0000000000000000 x27: 
> 0000000fffff8000
> [    9.997068] x26: ffff80000a8c5f70 x25: ffff0001c118d6a0 x24: 
> fffffc0000000000
> [   10.004165] x23: 0000001000000000 x22: ffff800009bc5e98 x21: 
> 0000000000000001
> [   10.011262] x20: 0000000000000001 x19: ffffffffffe00000 x18: 
> 0000000000000000
> [   10.018360] x17: 0000000000000000 x16: 0000000000000000 x15: 
> 0000000000000000
> [   10.025457] x14: 0000000000000000 x13: 0000000000000000 x12: 
> 0000000000000000
> [   10.032554] x11: 0000000000000000 x10: 0000000000000000 x9 : 
> ffff8000094032cc
> [   10.039651] x8 : 0000000000000000 x7 : 00000000ffffffff x6 : 
> 0000000000000001
> [   10.046748] x5 : 0000000000000000 x4 : ffff0001c118d6a0 x3 : 
> 0000000000000000
> [   10.053845] x2 : 0200000000000000 x1 : 0000000fffff8000 x0 : 
> ffffffffffe00000
> [   10.060943] Call trace:
> [   10.063373] __init_zone_device_page 
> (/home/ubuntu/kernel-6.4-rc5/./include/linux/atomic/atomic-instrumented.h:42 
> /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:99 
> /home/ubuntu/kernel-6.4-rc5/./include/linux/page_ref.h:115 
> /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:557 
> /home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:966) 
> [   10.067791] memmap_init_zone_device 
> (/home/ubuntu/kernel-6.4-rc5/mm/mm_init.c:1084) 
> [   10.072297] memremap_pages (/home/ubuntu/kernel-6.4-rc5/mm/memremap.c:270 
> /home/ubuntu/kernel-6.4-rc5/mm/memremap.c:366) 
> [   10.076111] devm_memremap_pages 
> (/home/ubuntu/kernel-6.4-rc5/mm/memremap.c:407) 
> [   10.080183] svm_migrate_init 
> (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_migrate.c:1029)
>  amdgpu
> [   10.085112] kgd2kfd_device_init 
> (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:647)
>  amdgpu
> [   10.090318] amdgpu_amdkfd_device_init 
> (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:197) 
> amdgpu
> [   10.096039] amdgpu_device_init 
> (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2537 
> /home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3871) 
> amdgpu
> [   10.101329] amdgpu_driver_load_kms 
> (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) 
> amdgpu
> [   10.106704] amdgpu_pci_probe 
> (/home/ubuntu/kernel-6.4-rc5/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2149) 
> amdgpu
> [   10.111648] local_pci_probe 
> (/home/ubuntu/kernel-6.4-rc5/drivers/pci/pci-driver.c:325) 
> [   10.115462] work_for_cpu_fn 
> (/home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:5370) 
> [   10.119190] process_one_work 
> (/home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2410) 
> [   10.123177] worker_thread 
> (/home/ubuntu/kernel-6.4-rc5/./include/linux/list.h:292 
> /home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2465 
> /home/ubuntu/kernel-6.4-rc5/kernel/workqueue.c:2557) 
> [   10.126905] kthread (/home/ubuntu/kernel-6.4-rc5/kernel/kthread.c:379) 
> [   10.130114] ret_from_fork 
> (/home/ubuntu/kernel-6.4-rc5/arch/arm64/kernel/entry.S:871) 
> [ 10.133670] Code: 910003fd a90153f3 12800007 d3490842 (b9003406)
> All code
> ========
>    0: 910003fd  mov x29, sp
>    4: a90153f3  stp x19, x20, [sp, #16]
>    8: 12800007  mov w7, #0xffffffff             // #-1
>    c: d3490842  ubfiz x2, x2, #55, #3
>   10:* b9003406  str w6, [x0, #52]  <-- trapping instruction
> 
> Code starting with the faulting instruction
> ===========================================
>    0: b9003406  str w6, [x0, #52]
> [   10.139730] ---[ end trace 0000000000000000 ]---

Reply via email to