Hi, Christian On Wed, 2025-08-06 at 15:28 +0200, Christian König wrote: > On some old x86 systems we had the problem that changing the caching > flags > of system memory requires changing the global MTRR/PAT tables. > > But on any modern x86 system (CPUs introduced rughly after 2004) we > actually don't need that any more and can update the caching flags > directly in the PTEs of the CPU mapping. It was just never disabled > because of the fear of regressions. > > We already use the PTE flags for encryption on x86 64bit for quite a > while > and all other supported platforms (Sparc, PowerPC, ARM, MIPS, > LONGARCH) > have never done anything different either.
IIRC from my VMWARE days, changing SEV encryption mode of a page still requires changing all mappings including kernel maps? __set_memory_enc_pgtable() > > So disable the page pool completely for 64bit systems and just insert > a > clflush to be on the safe side so that we never return memory with > dirty > cache lines. > > Testing on a Ryzen 5 and 7 shows that this has absolutely no > performance > impact and of hand the AMD CI can't find a problem either. > > Let's see what the i915 and XE CI systems say to that. > > Signed-off-by: Christian König <christian.koe...@amd.com> I don't think we can do this. First Lunar Lake can in some situations, just like the old Athlons, write-back clean cache lines which means writebacks of speculative prefetches may overwrite GPU data. LNL makes heavy use of non-coherent GPU mappings for performance. Second, IIRC vm_insert_pfn_prot() on X86 will override the given caching mode with the last caching mode set for the kernel linear map, so if you try to set up a write-combined GPU mapping without a previous call to set_pages_xxxxx it will actually end up cached. see track_pfn_insert(). /Thomas > --- > drivers/gpu/drm/ttm/ttm_pool.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c > b/drivers/gpu/drm/ttm/ttm_pool.c > index baf27c70a419..7487eac29398 100644 > --- a/drivers/gpu/drm/ttm/ttm_pool.c > +++ b/drivers/gpu/drm/ttm/ttm_pool.c > @@ -38,7 +38,7 @@ > #include <linux/highmem.h> > #include <linux/sched/mm.h> > > -#ifdef CONFIG_X86 > +#ifdef CONFIG_X86_32 > #include <asm/set_memory.h> > #endif > > @@ -46,6 +46,7 @@ > #include <drm/ttm/ttm_pool.h> > #include <drm/ttm/ttm_tt.h> > #include <drm/ttm/ttm_bo.h> > +#include <drm/drm_cache.h> > > #include "ttm_module.h" > > @@ -192,7 +193,7 @@ static void ttm_pool_free_page(struct ttm_pool > *pool, enum ttm_caching caching, > struct ttm_pool_dma *dma; > void *vaddr; > > -#ifdef CONFIG_X86 > +#ifdef CONFIG_X86_32 > /* We don't care that set_pages_wb is inefficient here. This > is only > * used when we have to shrink and CPU overhead is > irrelevant then. > */ > @@ -218,7 +219,7 @@ static void ttm_pool_free_page(struct ttm_pool > *pool, enum ttm_caching caching, > /* Apply any cpu-caching deferred during page allocation */ > static int ttm_pool_apply_caching(struct ttm_pool_alloc_state > *alloc) > { > -#ifdef CONFIG_X86 > +#ifdef CONFIG_X86_32 > unsigned int num_pages = alloc->pages - alloc- > >caching_divide; > > if (!num_pages) > @@ -232,6 +233,11 @@ static int ttm_pool_apply_caching(struct > ttm_pool_alloc_state *alloc) > case ttm_uncached: > return set_pages_array_uc(alloc->caching_divide, > num_pages); > } > + > +#elif defined(CONFIG_X86_64) > + unsigned int num_pages = alloc->pages - alloc- > >caching_divide; > + > + drm_clflush_pages(alloc->caching_divide, num_pages); > #endif > alloc->caching_divide = alloc->pages; > return 0; > @@ -342,7 +348,7 @@ static struct ttm_pool_type > *ttm_pool_select_type(struct ttm_pool *pool, > if (pool->use_dma_alloc) > return &pool->caching[caching].orders[order]; > > -#ifdef CONFIG_X86 > +#ifdef CONFIG_X86_32 > switch (caching) { > case ttm_write_combined: > if (pool->nid != NUMA_NO_NODE) > @@ -980,7 +986,7 @@ long ttm_pool_backup(struct ttm_pool *pool, > struct ttm_tt *tt, > pool->use_dma_alloc || ttm_tt_is_backed_up(tt)) > return -EBUSY; > > -#ifdef CONFIG_X86 > +#ifdef CONFIG_X86_32 > /* Anything returned to the system needs to be cached. */ > if (tt->caching != ttm_cached) > set_pages_array_wb(tt->pages, tt->num_pages);