On 3/25/25 18:35, Christian König wrote: > Am 24.03.25 um 23:48 schrieb Balbir Singh: >>>> lspci -v reports 8G of memory at 0xfc00000000 so I assmumed that is the >>>> GPU RAM. >>>> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 >>>> [Radeon RX 6600/6600 XT/6600M] (rev c3) >>>> Subsystem: Micro-Star International Co., Ltd. [MSI] Device 1313 >>>> Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 14 >>>> Memory at fc00000000 (64-bit, prefetchable) [size=8G] >>>> Memory at fe00000000 (64-bit, prefetchable) [size=256M] >>>> Memory at fca00000 (32-bit, non-prefetchable) [size=1M] >>>> Expansion ROM at fcb00000 [disabled] [size=128K] >>> Well when you set nokaslr then that moves the BAR address of the dGPU above >>> the limit the integrated GPU can access on the bus (usually 40 bits). >>> >>> Because of this the integrated GPU starts to fallback to system memory >>> below the 4GB limit to make sure that the stuff is always accessible by >>> everyone. >> Why does it fallback to GPU_DMA32? Is the rest of system memory not usable >> (upto 40 bits)? > > We need to guarantee that we don't run into using bounce buffers since the > high level APIs doesn't necessarily inform the kernel about the state > transitions for that. >
So effectively on larger systems (CPUs with more than 40 bits of addressing, the iGPU has to always go through the DMA32 window)? >> I did not realize that the iGPU is using the BAR memory of the dGPU. > > When the displayed content is rendered by the dGPU but the monitor connected > to the iGPU you somehow need to get the image to the iGPU. > > The most efficient approach is that the iGPU copies the image from the dGPUs > BAR directly into memory it can scanout from. > > Alternatively we can allocate some system memory, the dGPU copies the image > into that and then iGPU then copies the image into the scanout buffer. > > Some newer hardware can also directly scan out from that system memory and so > avoiding that extra copy. But this has a bunch of pre-requisites, for example > IOMMU needs to be disabled or in pass through mode. > >> I guess the issue goes away when amdgpu.gttsize is set to 2GB, because 2GB >> fits in the DMA32 window > > Well I would not say that the issue goes away, it just makes your symptoms go > away. Agreed > > The trick is that the gttsize is what we give to the Steam game as maximum > amount of system memory it can allocate. So it most likely stays below that > and so the extra system memory buffer for scanout can also fit below 4GB. > >>> Since the memory below 4GB is very very limited we are now starting to >>> constantly swap things in and out of that area. Basically completely >>> killing the performance of your Steam game. >>> >>> As far as I can see till that point the handling is completely intentional >>> and working as expected. >>> >>> The only thing which eludes me is why setting nokaslr changes the BAR of >>> the dGPU? Can I get the full dmesg with and with nokasl? >>> >> IIRC, the iGPU does not work correctly, the dGPU does, so it's an iGPU >> addressing constraint? > > The problem is more that the iGPU doesn't have any local memory, but rather > just uses (potentially stolen) system memory. > > But the questions remains: Why does the BAR move around? That should most > likely not happen. > The second region seems to be additional, I suspect that is HMM mapping from kgd2kfd_init_zone_device() Balbir Singh > Regards, > Christian. > >> Balbir >> >