* Balbir Singh <balb...@nvidia.com> wrote:

> On 3/20/25 20:01, Ingo Molnar wrote:
> > 
> > * Balbir Singh <balb...@nvidia.com> wrote:
> > 
> >> On 3/17/25 00:09, Bert Karwatzki wrote:
> >>> This is related to the admgpu.gttsize. My laptop has the maximum amount 
> >>> of memory (64G) and usually gttsize is half of main memory size. I just 
> >>> tested with cmdline="nokaslr amdgpi.gttsize=2048" and the problem does 
> >>> not occur. So I did some more testing with varying gttsize and got this
> >>> for the built-in GPU
> >>>
> >>> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> >>> Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
> >>>
> >>> (nokaslr is always enabeld)
> >>> gttssize   input behaviour
> >>>  2048             GOOD
> >>>  2064             GOOD
> >>>  2080             SEMIBAD (i.e. noticeable input lag but not as bad as 
> >>> below)
> >>>  3072             BAD
> >>>  4096             BAD
> >>>  8192             BAD
> >>> 16384             BAD
> >>>
> >>> As the build-in GPU has ~512 VRAM there seems to be problems when gttsize 
> >>> >
> >>> 4*VRAM so I tested for the discrete GPU with 8G of VRAM
> >>> gttsize   input behaviour
> >>> 49152             GOOD
> >>> 64000             GOOD
> >>>
> >>> So for the discrete GPU increasing gttsize does no reproduce the bug.
> >>>
> >>
> >> Very interesting, I am not a GTT expert, but with these experiments do you
> >> find anything interesting in
> >>
> >> /sys/kernel/debug/x86/pat_memtype_list?
> >>
> >> It's weird that you don't see any issues in Xorg (Xfce), just the games.
> >> May be we should get help from the amd-gfx experts to further 
> >> diagnose/debug
> >> the interaction of nokaslr with the game.
> > 
> > So basically your commit:
> > 
> >   7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")
> > 
> > inflicts part of the effects of a 'nokaslr' boot command line option, 
> > and triggers the regression due to that?
> > 
> > Or is there some other cause?
> > 
> 
> You are right in your assessment of the root cause. Just to reiterate
>
> - nokaslr does not work with the iGPU, specifically for the games 
>   mentioned
>
> - There is a workaround for the problem, which involves reducing the 
>   amdgpu.gttsize
>
> - The patch exposes the system to nokaslr situation (effect) when 
>   PCI_P2PDMA is enabled

Note that every major x86 distro I checked enables CONFIG_PCI_P2PDMA=y 
and also keeps KASLR enables, so the above qualifiers are immaterial in 
terms of user impact: it's a 100% certainty that distro kernels on 
these systems will regress under these games, right?

What is the importance of the original fix? I should have insisted on a 
fuller changelog, because it's rather thin on details:

  If the BAR address is beyond this limit, PCI peer to peer DMA
  mappings fail.

How frequently does this happen and what is the impact to users if this 
happens?

We might be forced to revert this change if it regresses other systems.

Thanks,

        Ingo

Reply via email to