On Mon, 2 Jun 2025 at 21:51, Christian König <christian.koe...@amd.com> wrote:
>
> On 6/1/25 22:50, Dave Airlie wrote:
> > Hey,
> >
> > I've been playing a bit with nouveau on aarch64, and I noticed ttm
> > translates ttm_uncached into pgprot_noncached which uses
> > MT_DEVICE_nGnRnE. This is of course a device mapping which isn't
> > appropriate for memory.
> >
> > For main memory we should be using pgprot_dmacoherent which translates
> > to MT_NORMAL_NC,
> > pgprot_writecombine also translates to MT_NORMAL_NC.
> >
> > Now I'm not sure anything gets this wrong right now, (except maybe
> > nouveau), but I'm wondering would adding a ttm_uncached_ram caching
> > type and rename ttm_uncached to ttm_uncached_device, if that would be
> > a good idea?
>
> Let me ask the other way around: Why is nouveau still using ttm_uncached with 
> system memory?
>
> IIRC there are only two use cases for ttm_uncached: >15year old AGP systems 
> which for some reason can't handle write combine and MMIO BARs.
>
> E.g. for amdgpu the doorbells and HDP remapping are mapped with ttm_uncached 
> these days but nothing else.

Well I'm not 100% sure what is valid here for nouveau to be doing.

There are three types of aarch64 deployments from NVIDIA,
1. dGPU in an aarch64
2. tegra where the interconnect isn't PCIe
3. Grace Hopper where there is NVlink or some such between the CPU and
GPU and in theory everything is coherent.

I've been trying to get the last one working a bit better (but I've
given up for now). On these systems there are a bunch of things we'd
normally place in VRAM but due to other reasons we now have to place
in GTT. On ARM64 NVIDIA seems to force uncached mappings for these
particular shared memory regions. Nouveau also has a similar path in
place for tegra systems, where nouveau_sgdma.c picks ttm_uncached,
then we end up vmap'ing that page to map it into the kernel side. The
vmap changes that to pgprot_noncached which to me is the wrong thing.
It doesn't seem right to pick ttm_writecombined here, though that
might just work, hence why I suggested adding ttm_uncached_ram.

>
> > Has anyone else come across this problem with TTM on aarch64? or
> > understand if I'm missing something.
>
> If I'm not completely mistaken both pgprot_dmacoherent and 
> pgprot_writecombine map to MT_NORMAL_NC because there is no such thing as 
> uncached system memory without write combining on aarch64.
>
> I mean why would you want to do this except for getting the MMIO write 
> ordering right? Avoiding write memory barriers?

I'm not 100% sure why tegra does it in the first place, I suspect
working around lack of knowledge on what is correct and just hey this
works, so move on.

Dave.

Reply via email to