On 3/24/25 23:14, Christian König wrote: > Am 24.03.25 um 12:23 schrieb Bert Karwatzki: >> Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh: >>> On 3/22/25 23:23, Bert Karwatzki wrote: >>>> ... >>>> So why is use_dma32 enabled with nokaslr? Some more printk()s give this >>>> result: >>>> >>>> The GPUs: >>>> built-in: >>>> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] >>>> Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5) >>>> discrete: >>>> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 >>>> [Radeon RX 6600/6600 XT/6600M] (rev c3) >>>> >>>> With nokaslr: >>>> [ 1.266517] [ T328] dma_addressing_limited: mask = 0xfffffffffff >>>> bus_dma_limit = 0x0 required_mask = 0xfffffffff >>>> [ 1.266519] [ T328] dma_addressing_limited: ops = 0000000000000000 >>>> use_dma_iommu(dev) = 0 >>>> [ 1.266520] [ T328] dma_direct_all_ram_mapped: returning true >>>> [ 1.266521] [ T328] dma_addressing_limited: returning ret = 0 >>>> [ 1.266521] [ T328] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: >>>> calling ttm_device_init() with use_dma32 = 0 >>>> [ 1.266525] [ T328] entering ttm_device_init, use_dma32 = 0 >>>> [ 1.267115] [ T328] entering ttm_pool_init, use_dma32 = 0 >>>> >>>> [ 3.965669] [ T328] dma_addressing_limited: mask = 0xfffffffffff >>>> bus_dma_limit = 0x0 required_mask = 0x3fffffffffff >>>> [ 3.965671] [ T328] dma_addressing_limited: returning true >>>> [ 3.965672] [ T328] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: >>>> calling ttm_device_init() with use_dma32 = 1 >>>> [ 3.965674] [ T328] entering ttm_device_init, use_dma32 = 1 >>>> [ 3.965747] [ T328] entering ttm_pool_init, use_dma32 = 1 >>>> >>>> Without nokaslr: >>>> [ 1.300907] [ T351] dma_addressing_limited: mask = 0xfffffffffff >>>> bus_dma_limit = 0x0 required_mask = 0xfffffffff >>>> [ 1.300909] [ T351] dma_addressing_limited: ops = 0000000000000000 >>>> use_dma_iommu(dev) = 0 >>>> [ 1.300910] [ T351] dma_direct_all_ram_mapped: returning true >>>> [ 1.300910] [ T351] dma_addressing_limited: returning ret = 0 >>>> [ 1.300911] [ T351] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: >>>> calling ttm_device_init() with use_dma32 = 0 >>>> [ 1.300915] [ T351] entering ttm_device_init, use_dma32 = 0 >>>> [ 1.301210] [ T351] entering ttm_pool_init, use_dma32 = 0 >>>> >>>> [ 4.000602] [ T351] dma_addressing_limited: mask = 0xfffffffffff >>>> bus_dma_limit = 0x0 required_mask = 0xfffffffffff >>>> [ 4.000603] [ T351] dma_addressing_limited: ops = 0000000000000000 >>>> use_dma_iommu(dev) = 0 >>>> [ 4.000604] [ T351] dma_direct_all_ram_mapped: returning true >>>> [ 4.000605] [ T351] dma_addressing_limited: returning ret = 0 >>>> [ 4.000606] [ T351] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: >>>> calling ttm_device_init() with use_dma32 = 0 >>>> [ 4.000610] [ T351] entering ttm_device_init, use_dma32 = 0 >>>> [ 4.000687] [ T351] entering ttm_pool_init, use_dma32 = 0 >>>> >>>> So with nokaslr the reuqired mask for the built-in GPU changes from >>>> 0xfffffffffff >>>> to 0x3fffffffffff which causes dma_addressing_limited to return true which >>>> causes >>>> the ttm_device init to be called with use_dma32 = true. >>> Thanks, this is really the root cause, from what I understand. > > Yeah, completely agree. > >>> >>>> It also show that for the discreate GPU nothing changes so the bug does >>>> not occur >>>> there. >>>> >>>> I also was able to work around the bug by calling ttm_device_init() with >>>> use_dma32=false >>>> from amdgpu_ttm_init() (drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c) but I'm >>>> not sure if this >>>> has unwanted side effects. >>>> >>>> int amdgpu_ttm_init(struct amdgpu_device *adev) >>>> { >>>> uint64_t gtt_size; >>>> int r; >>>> >>>> mutex_init(&adev->mman.gtt_window_lock); >>>> >>>> dma_set_max_seg_size(adev->dev, UINT_MAX); >>>> /* No others user of address space so set it to 0 */ >>>> dev_info(adev->dev, "%s: calling ttm_device_init() with use_dma32 = 0 >>>> ignoring %d\n", __func__, dma_addressing_limited(adev->dev)); >>>> r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev, >>>> adev_to_drm(adev)->anon_inode->i_mapping, >>>> adev_to_drm(adev)->vma_offset_manager, >>>> adev->need_swiotlb, >>>> false /* use_dma32 */); >>>> if (r) { >>>> DRM_ERROR("failed initializing buffer object driver(%d).\n", r); >>>> return r; >>>> } >>>> >>> I think this brings us really close, instead of forcing use_dma32 to false, >>> I wonder if we need something like >>> >>> uin64_t dma_bits = fls64(dma_get_mask(adev->dev)); >>> >>> to ttm_device_init, pass the last argument (use_dma32) as dma_bits < 32? > > The handling is completely correct as far as i can see. > >>> >>> >>> Thanks, >>> Balbir Singh >>> >> Do these address bits have to shift when using nokaslr or PCI_P2PDMA, I think >> this shift cause the increase of the required_dma_mask to 0x3fffffffffff? >> >> @@ -104,4 +104,4 @@ >> fe30300000-fe303fffff : 0000:04:00.0 >> fe30400000-fe30403fff : 0000:04:00.0 >> fe30404000-fe30404fff : 0000:04:00.0 >> -afe00000000-affffffffff : 0000:03:00.0 >> +3ffe00000000-3fffffffffff : 0000:03:00.0 >> >> And what memory is this? It's 8G in size so it could be the RAM of the >> discrete >> GPU (which is at PCI 0000:03:00.0), but that is already here (part of >> /proc/iomem): >> >> 1010000000-ffffffffff : PCI Bus 0000:00 >> fc00000000-fe0fffffff : PCI Bus 0000:01 >> fc00000000-fe0fffffff : PCI Bus 0000:02 >> fc00000000-fe0fffffff : PCI Bus 0000:03 >> fc00000000-fdffffffff : 0000:03:00.0 GPU RAM >> fe00000000-fe0fffffff : 0000:03:00.0 >> >> lspci -v reports 8G of memory at 0xfc00000000 so I assmumed that is the GPU >> RAM. >> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 >> [Radeon RX 6600/6600 XT/6600M] (rev c3) >> Subsystem: Micro-Star International Co., Ltd. [MSI] Device 1313 >> Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 14 >> Memory at fc00000000 (64-bit, prefetchable) [size=8G] >> Memory at fe00000000 (64-bit, prefetchable) [size=256M] >> Memory at fca00000 (32-bit, non-prefetchable) [size=1M] >> Expansion ROM at fcb00000 [disabled] [size=128K] > > Well when you set nokaslr then that moves the BAR address of the dGPU above > the limit the integrated GPU can access on the bus (usually 40 bits). > > Because of this the integrated GPU starts to fallback to system memory below > the 4GB limit to make sure that the stuff is always accessible by everyone.
Why does it fallback to GPU_DMA32? Is the rest of system memory not usable (upto 40 bits)? I did not realize that the iGPU is using the BAR memory of the dGPU. I guess the issue goes away when amdgpu.gttsize is set to 2GB, because 2GB fits in the DMA32 window > > Since the memory below 4GB is very very limited we are now starting to > constantly swap things in and out of that area. Basically completely killing > the performance of your Steam game. > > As far as I can see till that point the handling is completely intentional > and working as expected. > > The only thing which eludes me is why setting nokaslr changes the BAR of the > dGPU? Can I get the full dmesg with and with nokasl? > IIRC, the iGPU does not work correctly, the dGPU does, so it's an iGPU addressing constraint? Balbir