On 3/25/25 18:35, Christian König wrote:
> Am 24.03.25 um 23:48 schrieb Balbir Singh:
>>>> lspci -v reports 8G of memory at 0xfc00000000 so I assmumed that is the 
>>>> GPU RAM.
>>>> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23
>>>> [Radeon RX 6600/6600 XT/6600M] (rev c3)
>>>>    Subsystem: Micro-Star International Co., Ltd. [MSI] Device 1313
>>>>    Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 14
>>>>    Memory at fc00000000 (64-bit, prefetchable) [size=8G]
>>>>    Memory at fe00000000 (64-bit, prefetchable) [size=256M]
>>>>    Memory at fca00000 (32-bit, non-prefetchable) [size=1M]
>>>>    Expansion ROM at fcb00000 [disabled] [size=128K]
>>> Well when you set nokaslr then that moves the BAR address of the dGPU above 
>>> the limit the integrated GPU can access on the bus (usually 40 bits).
>>>
>>> Because of this the integrated GPU starts to fallback to system memory 
>>> below the 4GB limit to make sure that the stuff is always accessible by 
>>> everyone.
>> Why does it fallback to GPU_DMA32? Is the rest of system memory not usable 
>> (upto 40 bits)?
> 
> We need to guarantee that we don't run into using bounce buffers since the 
> high level APIs doesn't necessarily inform the kernel about the state 
> transitions for that.
> 

So effectively on larger systems (CPUs with more than 40 bits of addressing, 
the iGPU has to 
always go through the DMA32 window)?

>> I did not realize that the iGPU is using the BAR memory of the dGPU.
> 
> When the displayed content is rendered by the dGPU but the monitor connected 
> to the iGPU you somehow need to get the image to the iGPU.
> 
> The most efficient approach is that the iGPU copies the image from the dGPUs 
> BAR directly into memory it can scanout from.
> 
> Alternatively we can allocate some system memory, the dGPU copies the image 
> into that and then iGPU then copies the image into the scanout buffer.
> 
> Some newer hardware can also directly scan out from that system memory and so 
> avoiding that extra copy. But this has a bunch of pre-requisites, for example 
> IOMMU needs to be disabled or in pass through mode.
> 
>> I guess the issue goes away when amdgpu.gttsize is set to 2GB, because 2GB 
>> fits in the DMA32 window
> 
> Well I would not say that the issue goes away, it just makes your symptoms go 
> away.

Agreed

> 
> The trick is that the gttsize is what we give to the Steam game as maximum 
> amount of system memory it can allocate. So it most likely stays below that 
> and so the extra system memory buffer for scanout can also fit below 4GB.
> 
>>> Since the memory below 4GB is very very limited we are now starting to 
>>> constantly swap things in and out of that area. Basically completely 
>>> killing the performance of your Steam game.
>>>
>>> As far as I can see till that point the handling is completely intentional 
>>> and working as expected.
>>>
>>> The only thing which eludes me is why setting nokaslr changes the BAR of 
>>> the dGPU? Can I get the full dmesg with and with nokasl?
>>>
>> IIRC, the iGPU does not work correctly, the dGPU does, so it's an iGPU 
>> addressing constraint?
> 
> The problem is more that the iGPU doesn't have any local memory, but rather 
> just uses (potentially stolen) system memory.
> 
> But the questions remains: Why does the BAR move around? That should most 
> likely not happen.
> 

The second region seems to be additional, I suspect that is HMM mapping from 
kgd2kfd_init_zone_device()

Balbir Singh



> Regards,
> Christian.
> 
>> Balbir
>>
> 

Reply via email to