Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh:
> On 3/22/25 23:23, Bert Karwatzki wrote:
> > The problem occurs in this part of ttm_tt_populate(), in the nokaslr case
> > the loop is entered and repeatedly run because ttm_dma32_pages allocated 
> > exceeds
> > the ttm_dma32_pages_limit which leads to lots of calls to 
> > ttm_global_swapout().
> >
> > if (!strcmp(get_current()->comm, "stellaris"))
> >     printk(KERN_INFO "%s: ttm_pages_allocated=0x%llx ttm_pages_limit=0x%lx 
> > ttm_dma32_pages_allocated=0x%llx ttm_dma32_pages_limit=0x%lx\n",
> >                     __func__, ttm_pages_allocated.counter, ttm_pages_limit, 
> > ttm_dma32_pages_allocated.counter, ttm_dma32_pages_limit);
> > while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
> >        atomic_long_read(&ttm_dma32_pages_allocated) >
> >        ttm_dma32_pages_limit) {
> >
> >     if (!strcmp(get_current()->comm, "stellaris"))
> >     printk(KERN_INFO "%s: count=%d ttm_pages_allocated=0x%llx 
> > ttm_pages_limit=0x%lx ttm_dma32_pages_allocated=0x%llx 
> > ttm_dma32_pages_limit=0x%lx\n",
> >                     __func__, count++, ttm_pages_allocated.counter, 
> > ttm_pages_limit, ttm_dma32_pages_allocated.counter, ttm_dma32_pages_limit);
> >     ret = ttm_global_swapout(ctx, GFP_KERNEL);
> >     if (ret == 0)
> >             break;
> >     if (ret < 0)
> >             goto error;
> > }
> >
> > In the case without nokaslr on the number of ttm_dma32_pages_allocated is 0 
> > because
> > use_dma32 == false in this case.
> >
> > So why is use_dma32 enabled with nokaslr? Some more printk()s give this 
> > result:
> >
> > The GPUs:
> > built-in:
> > 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] 
> > Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
> > discrete:
> > 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 
> > [Radeon RX 6600/6600 XT/6600M] (rev c3)
> >
> > With nokaslr:
> > [    1.266517] [    T328] dma_addressing_limited: mask = 0xfffffffffff 
> > bus_dma_limit = 0x0 required_mask = 0xfffffffff
> > [    1.266519] [    T328] dma_addressing_limited: ops = 0000000000000000 
> > use_dma_iommu(dev) = 0
> > [    1.266520] [    T328] dma_direct_all_ram_mapped: returning true
> > [    1.266521] [    T328] dma_addressing_limited: returning ret = 0
> > [    1.266521] [    T328] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: 
> > calling ttm_device_init() with use_dma32 = 0
> > [    1.266525] [    T328] entering ttm_device_init, use_dma32 = 0
> > [    1.267115] [    T328] entering ttm_pool_init, use_dma32 = 0
> >
> > [    3.965669] [    T328] dma_addressing_limited: mask = 0xfffffffffff 
> > bus_dma_limit = 0x0 required_mask = 0x3fffffffffff
> > [    3.965671] [    T328] dma_addressing_limited: returning true
> > [    3.965672] [    T328] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: 
> > calling ttm_device_init() with use_dma32 = 1
> > [    3.965674] [    T328] entering ttm_device_init, use_dma32 = 1
> > [    3.965747] [    T328] entering ttm_pool_init, use_dma32 = 1
> >
> > Without nokaslr:
> > [    1.300907] [    T351] dma_addressing_limited: mask = 0xfffffffffff 
> > bus_dma_limit = 0x0 required_mask = 0xfffffffff
> > [    1.300909] [    T351] dma_addressing_limited: ops = 0000000000000000 
> > use_dma_iommu(dev) = 0
> > [    1.300910] [    T351] dma_direct_all_ram_mapped: returning true
> > [    1.300910] [    T351] dma_addressing_limited: returning ret = 0
> > [    1.300911] [    T351] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: 
> > calling ttm_device_init() with use_dma32 = 0
> > [    1.300915] [    T351] entering ttm_device_init, use_dma32 = 0
> > [    1.301210] [    T351] entering ttm_pool_init, use_dma32 = 0
> >
> > [    4.000602] [    T351] dma_addressing_limited: mask = 0xfffffffffff 
> > bus_dma_limit = 0x0 required_mask = 0xfffffffffff
> > [    4.000603] [    T351] dma_addressing_limited: ops = 0000000000000000 
> > use_dma_iommu(dev) = 0
> > [    4.000604] [    T351] dma_direct_all_ram_mapped: returning true
> > [    4.000605] [    T351] dma_addressing_limited: returning ret = 0
> > [    4.000606] [    T351] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: 
> > calling ttm_device_init() with use_dma32 = 0
> > [    4.000610] [    T351] entering ttm_device_init, use_dma32 = 0
> > [    4.000687] [    T351] entering ttm_pool_init, use_dma32 = 0
> >
> > So with nokaslr the reuqired mask for the built-in GPU changes from 
> > 0xfffffffffff
> > to 0x3fffffffffff which causes dma_addressing_limited to return true which 
> > causes
> > the ttm_device init to be called with use_dma32 = true.
>
> Thanks, this is really the root cause, from what I understand.
>
> >  It also show that for the discreate GPU nothing changes so the bug does 
> > not occur
> > there.
> >
> > I also was able to work around the bug by calling ttm_device_init() with 
> > use_dma32=false
> > from amdgpu_ttm_init()  (drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c) but I'm 
> > not sure if this
> > has unwanted side effects.
> >
> > int amdgpu_ttm_init(struct amdgpu_device *adev)
> > {
> >     uint64_t gtt_size;
> >     int r;
> >
> >     mutex_init(&adev->mman.gtt_window_lock);
> >
> >     dma_set_max_seg_size(adev->dev, UINT_MAX);
> >     /* No others user of address space so set it to 0 */
> >     dev_info(adev->dev, "%s: calling ttm_device_init() with use_dma32 = 0 
> > ignoring %d\n", __func__, dma_addressing_limited(adev->dev));
> >     r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
> >                            adev_to_drm(adev)->anon_inode->i_mapping,
> >                            adev_to_drm(adev)->vma_offset_manager,
> >                            adev->need_swiotlb,
> >                            false /* use_dma32 */);
> >     if (r) {
> >             DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
> >             return r;
> >     }
> >
>
> I think this brings us really close, instead of forcing use_dma32 to false, I 
> wonder if we need something like
>
> uin64_t dma_bits = fls64(dma_get_mask(adev->dev));
>
> to ttm_device_init, pass the last argument (use_dma32) as dma_bits < 32?
>
>
> Thanks,
> Balbir Singh
>

Do these address bits have to shift when using nokaslr or PCI_P2PDMA, I think
this shift cause the increase of the required_dma_mask to 0x3fffffffffff?

@@ -104,4 +104,4 @@
       fe30300000-fe303fffff : 0000:04:00.0
     fe30400000-fe30403fff : 0000:04:00.0
     fe30404000-fe30404fff : 0000:04:00.0
-afe00000000-affffffffff : 0000:03:00.0
+3ffe00000000-3fffffffffff : 0000:03:00.0

And what memory is this? It's 8G in size so it could be the RAM of the discrete
GPU (which is at PCI 0000:03:00.0), but that is already here (part of
/proc/iomem):

1010000000-ffffffffff : PCI Bus 0000:00
  fc00000000-fe0fffffff : PCI Bus 0000:01
    fc00000000-fe0fffffff : PCI Bus 0000:02
      fc00000000-fe0fffffff : PCI Bus 0000:03
        fc00000000-fdffffffff : 0000:03:00.0  GPU RAM
        fe00000000-fe0fffffff : 0000:03:00.0

lspci -v reports 8G of memory at 0xfc00000000 so I assmumed that is the GPU RAM.
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23
[Radeon RX 6600/6600 XT/6600M] (rev c3)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 1313
        Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 14
        Memory at fc00000000 (64-bit, prefetchable) [size=8G]
        Memory at fe00000000 (64-bit, prefetchable) [size=256M]
        Memory at fca00000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at fcb00000 [disabled] [size=128K]

Bert Karwatzki

Reply via email to