Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh: > On 3/22/25 23:23, Bert Karwatzki wrote: > > The problem occurs in this part of ttm_tt_populate(), in the nokaslr case > > the loop is entered and repeatedly run because ttm_dma32_pages allocated > > exceeds > > the ttm_dma32_pages_limit which leads to lots of calls to > > ttm_global_swapout(). > > > > if (!strcmp(get_current()->comm, "stellaris")) > > printk(KERN_INFO "%s: ttm_pages_allocated=0x%llx ttm_pages_limit=0x%lx > > ttm_dma32_pages_allocated=0x%llx ttm_dma32_pages_limit=0x%lx\n", > > __func__, ttm_pages_allocated.counter, ttm_pages_limit, > > ttm_dma32_pages_allocated.counter, ttm_dma32_pages_limit); > > while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit || > > atomic_long_read(&ttm_dma32_pages_allocated) > > > ttm_dma32_pages_limit) { > > > > if (!strcmp(get_current()->comm, "stellaris")) > > printk(KERN_INFO "%s: count=%d ttm_pages_allocated=0x%llx > > ttm_pages_limit=0x%lx ttm_dma32_pages_allocated=0x%llx > > ttm_dma32_pages_limit=0x%lx\n", > > __func__, count++, ttm_pages_allocated.counter, > > ttm_pages_limit, ttm_dma32_pages_allocated.counter, ttm_dma32_pages_limit); > > ret = ttm_global_swapout(ctx, GFP_KERNEL); > > if (ret == 0) > > break; > > if (ret < 0) > > goto error; > > } > > > > In the case without nokaslr on the number of ttm_dma32_pages_allocated is 0 > > because > > use_dma32 == false in this case. > > > > So why is use_dma32 enabled with nokaslr? Some more printk()s give this > > result: > > > > The GPUs: > > built-in: > > 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] > > Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) > > discrete: > > 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 > > [Radeon RX 6600/6600 XT/6600M] (rev c3) > > > > With nokaslr: > > [ 1.266517] [ T328] dma_addressing_limited: mask = 0xfffffffffff > > bus_dma_limit = 0x0 required_mask = 0xfffffffff > > [ 1.266519] [ T328] dma_addressing_limited: ops = 0000000000000000 > > use_dma_iommu(dev) = 0 > > [ 1.266520] [ T328] dma_direct_all_ram_mapped: returning true > > [ 1.266521] [ T328] dma_addressing_limited: returning ret = 0 > > [ 1.266521] [ T328] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: > > calling ttm_device_init() with use_dma32 = 0 > > [ 1.266525] [ T328] entering ttm_device_init, use_dma32 = 0 > > [ 1.267115] [ T328] entering ttm_pool_init, use_dma32 = 0 > > > > [ 3.965669] [ T328] dma_addressing_limited: mask = 0xfffffffffff > > bus_dma_limit = 0x0 required_mask = 0x3fffffffffff > > [ 3.965671] [ T328] dma_addressing_limited: returning true > > [ 3.965672] [ T328] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: > > calling ttm_device_init() with use_dma32 = 1 > > [ 3.965674] [ T328] entering ttm_device_init, use_dma32 = 1 > > [ 3.965747] [ T328] entering ttm_pool_init, use_dma32 = 1 > > > > Without nokaslr: > > [ 1.300907] [ T351] dma_addressing_limited: mask = 0xfffffffffff > > bus_dma_limit = 0x0 required_mask = 0xfffffffff > > [ 1.300909] [ T351] dma_addressing_limited: ops = 0000000000000000 > > use_dma_iommu(dev) = 0 > > [ 1.300910] [ T351] dma_direct_all_ram_mapped: returning true > > [ 1.300910] [ T351] dma_addressing_limited: returning ret = 0 > > [ 1.300911] [ T351] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: > > calling ttm_device_init() with use_dma32 = 0 > > [ 1.300915] [ T351] entering ttm_device_init, use_dma32 = 0 > > [ 1.301210] [ T351] entering ttm_pool_init, use_dma32 = 0 > > > > [ 4.000602] [ T351] dma_addressing_limited: mask = 0xfffffffffff > > bus_dma_limit = 0x0 required_mask = 0xfffffffffff > > [ 4.000603] [ T351] dma_addressing_limited: ops = 0000000000000000 > > use_dma_iommu(dev) = 0 > > [ 4.000604] [ T351] dma_direct_all_ram_mapped: returning true > > [ 4.000605] [ T351] dma_addressing_limited: returning ret = 0 > > [ 4.000606] [ T351] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: > > calling ttm_device_init() with use_dma32 = 0 > > [ 4.000610] [ T351] entering ttm_device_init, use_dma32 = 0 > > [ 4.000687] [ T351] entering ttm_pool_init, use_dma32 = 0 > > > > So with nokaslr the reuqired mask for the built-in GPU changes from > > 0xfffffffffff > > to 0x3fffffffffff which causes dma_addressing_limited to return true which > > causes > > the ttm_device init to be called with use_dma32 = true. > > Thanks, this is really the root cause, from what I understand. > > > It also show that for the discreate GPU nothing changes so the bug does > > not occur > > there. > > > > I also was able to work around the bug by calling ttm_device_init() with > > use_dma32=false > > from amdgpu_ttm_init() (drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c) but I'm > > not sure if this > > has unwanted side effects. > > > > int amdgpu_ttm_init(struct amdgpu_device *adev) > > { > > uint64_t gtt_size; > > int r; > > > > mutex_init(&adev->mman.gtt_window_lock); > > > > dma_set_max_seg_size(adev->dev, UINT_MAX); > > /* No others user of address space so set it to 0 */ > > dev_info(adev->dev, "%s: calling ttm_device_init() with use_dma32 = 0 > > ignoring %d\n", __func__, dma_addressing_limited(adev->dev)); > > r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev, > > adev_to_drm(adev)->anon_inode->i_mapping, > > adev_to_drm(adev)->vma_offset_manager, > > adev->need_swiotlb, > > false /* use_dma32 */); > > if (r) { > > DRM_ERROR("failed initializing buffer object driver(%d).\n", r); > > return r; > > } > > > > I think this brings us really close, instead of forcing use_dma32 to false, I > wonder if we need something like > > uin64_t dma_bits = fls64(dma_get_mask(adev->dev)); > > to ttm_device_init, pass the last argument (use_dma32) as dma_bits < 32? > > > Thanks, > Balbir Singh >
Do these address bits have to shift when using nokaslr or PCI_P2PDMA, I think this shift cause the increase of the required_dma_mask to 0x3fffffffffff? @@ -104,4 +104,4 @@ fe30300000-fe303fffff : 0000:04:00.0 fe30400000-fe30403fff : 0000:04:00.0 fe30404000-fe30404fff : 0000:04:00.0 -afe00000000-affffffffff : 0000:03:00.0 +3ffe00000000-3fffffffffff : 0000:03:00.0 And what memory is this? It's 8G in size so it could be the RAM of the discrete GPU (which is at PCI 0000:03:00.0), but that is already here (part of /proc/iomem): 1010000000-ffffffffff : PCI Bus 0000:00 fc00000000-fe0fffffff : PCI Bus 0000:01 fc00000000-fe0fffffff : PCI Bus 0000:02 fc00000000-fe0fffffff : PCI Bus 0000:03 fc00000000-fdffffffff : 0000:03:00.0 GPU RAM fe00000000-fe0fffffff : 0000:03:00.0 lspci -v reports 8G of memory at 0xfc00000000 so I assmumed that is the GPU RAM. 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 1313 Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 14 Memory at fc00000000 (64-bit, prefetchable) [size=8G] Memory at fe00000000 (64-bit, prefetchable) [size=256M] Memory at fca00000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at fcb00000 [disabled] [size=128K] Bert Karwatzki