Dan Horák <[email protected]> writes:

> Hi Ritesh,
>
> On Sun, 15 Mar 2026 09:55:11 +0530
> Ritesh Harjani (IBM) <[email protected]> wrote:
>
>> Dan Horák <[email protected]> writes:
>> 
>> +cc Gaurav,
>> 
>> > Hi,
>> >
>> > starting with 7.0-rc1 (meaning 6.19 is OK) the amdgpu driver fails to
>> > initialize on my Linux/ppc64le Power9 based system (with Radeon Pro WX4100)
>> > with the following in the log
>> >
>> > ...
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: GART: 256M 
>> > 0x000000FF00000000 - 0x000000FF0FFFFFFF
>> 
>>                   ^^^^
>> So looks like this is a PowerNV (Power9) machine.
>
> correct :-)
>  
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Detected 
>> > VRAM RAM=4096M, BAR=4096M
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] RAM 
>> > width 128bits GDDR5
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: iommu: 64-bit 
>> > OK but direct DMA is limited by 0
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: 
>> > dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  4096M of VRAM 
>> > memory ready
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  32570M of GTT 
>> > memory ready.
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed 
>> > to allocate kernel bo
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Debug 
>> > VRAM access will use slowpath MM access
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] GART: 
>> > num cpu pages 4096, num gpu pages 65536
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] PCIE 
>> > GART of 256M enabled (table at 0x000000F4FFF80000).
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed 
>> > to allocate kernel bo
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) create 
>> > WB bo failed
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: 
>> > amdgpu_device_wb_init failed -12
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: 
>> > amdgpu_device_ip_init failed
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: Fatal error 
>> > during GPU init
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: finishing 
>> > device.
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: probe with 
>> > driver amdgpu failed with error -12
>> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  ttm finalized
>> > ...
>> >
>> > After some hints from Alex and bisecting and other investigation I have
>> > found that 
>> > https://github.com/torvalds/linux/commit/1471c517cf7dae1a6342fb821d8ed501af956dd0
>> > is the culprit and reverting it makes amdgpu load (and work) again.
>> 
>> Thanks for confirming this. Yes, this was recently added [1]
>> 
>> [1]: 
>> https://lore.kernel.org/linuxppc-dev/[email protected]/
>>  
>> 
>> 
>> @Gaurav,
>> 
>> I am not too familiar with the area, however looking at the logs shared
>> by Dan, it looks like we might be always going for dma direct allocation
>> path and maybe the device doesn't support this address limit. 
>> 
>>  bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: iommu: 64-bit 
>> OK but direct DMA is limited by 0
>>  bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: 
>> dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
>
> a complete kernel log is at
> https://gitlab.freedesktop.org/-/project/4522/uploads/c4935bca6f37bbd06bb4045c07d00b5b/kernel.log
>
> Please let me know if you need more info.

Hi Dan,

Thanks for sharing the kernel log. Is it also possible to kindly share
your full kernel config with which you saw this issue.

I think Gaurav, is still looking into reported issue. However I was
interested in this kernel log output..

bře 05 08:35:34 talos.danny.cz kernel: radix-mmu: Mapped 
0x00002007fad00000-0x00002007fcd00000 with 64.0 KiB pages

This shows that the system is using 64K pagesize. So I was interested in
knowing the kernel configs you have enabled. Donet has recently posted
64K pagesize support with amdgpu [1][2] on Power. However, I think, we
can still use it w/o Donet's changes if we have CONFIG_HSA_AMD_SVM
disabled.

So, can you kindly share the kernel configs and the AMD GPU HW details
attached to your Power9 baremetal system, if it's possible?

[1]: 
https://lore.kernel.org/amd-gfx/[email protected]/#t  
   #merged
[2]: 
https://lore.kernel.org/amd-gfx/[email protected]/    
   #in-review

-ritesh

Reply via email to