On Thu, May 14, 2026 at 09:49:33AM -0400, Gregory Price wrote:
> On Tue, May 12, 2026 at 05:05:54PM -0400, Michael S. Tsirkin wrote:
> > When post_alloc_hook() needs to zero a page for an explicit
> > __GFP_ZERO allocation for a user page (user_addr is set), use 
> > folio_zero_user()
> > instead of kernel_init_pages().  This zeros near the faulting
> > address last, keeping those cachelines hot for the impending
> > user access.
> > 
> > folio_zero_user() is only used for explicit __GFP_ZERO, not for
> > init_on_alloc.  On architectures with virtually-indexed caches
> > (e.g., ARM), clear_user_highpage() performs per-line cache
> > operations; using it for init_on_alloc would add overhead that
> > kernel_init_pages() avoids (the page fault path flushes the
> > cache at PTE installation time regardless).
> > 
> > No functional change yet: current callers do not pass __GFP_ZERO
> > for user pages (they zero at the callsite instead).  Subsequent
> > patches will convert them.
> > 
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > Assisted-by: Claude:claude-opus-4-6
> > ---
> >  mm/page_alloc.c | 17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index db387dd6b813..76f39dd026ff 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1861,9 +1861,20 @@ inline void post_alloc_hook(struct page *page, 
> > unsigned int order,
> >             for (i = 0; i != 1 << order; ++i)
> >                     page_kasan_tag_reset(page + i);
> >     }
> > -   /* If memory is still not initialized, initialize it now. */
> > -   if (init)
> > -           kernel_init_pages(page, 1 << order);
> > +   /*
> > +    * If memory is still not initialized, initialize it now.
> > +    * When __GFP_ZERO was explicitly requested and user_addr is set,
> > +    * use folio_zero_user() which zeros near the faulting address
> > +    * last, keeping those cachelines hot.  For init_on_alloc, use
> > +    * kernel_init_pages() to avoid unnecessary cache flush overhead
> > +    * on architectures with virtually-indexed caches.
> > +    */
> > +   if (init) {
> > +           if ((gfp_flags & __GFP_ZERO) && user_addr != USER_ADDR_NONE)
> > +                   folio_zero_user(page_folio(page), user_addr);
> > +           else
> > +                   kernel_init_pages(page, 1 << order);
> > +   }
> 
> Open question but not necessarily in-scope:
> 
> Should __GFP_ZERO just be implied if (user_addr != USER_ADDR_NONE)?


There are calls with no __GFP_ZERO but they do not allocate userspace pages.

  - drm_pagemap.c: GFP_HIGHUSER -- no zero. But this is a DRM device
    page migration, the page content is preserved from the source.

  - test_hmm.c: GFP_HIGHUSER_MOVABLE -- no zero. Test driver, pages get
    content from device.

  - mm/ksm.c: GFP_HIGHUSER_MOVABLE -- no zero. KSM merges identical
    pages, content comes from the source page (copy).

  - mm/memory.c new_folio = GFP_HIGHUSER_MOVABLE
    - no zero. This is CoW, content is copied from old page.

  - mm/userfaultfd.c: GFP_HIGHUSER_MOVABLE - no zero. Content comes from 
userspace via userfaultfd.

  - arm64/fault.c: __GFP_ZEROTAGS not __GFP_ZERO. MTE tag zeroing, not page 
zeroing. Page is zeroed separately.


> Putting aside how that's done without introducing another gfp flag
> (maybe something explicit like `alloc_pages_nozero(...)` ), it seems
> like a very short jump to just adding __GFP_ZERO to any user-alloc by
> default.
> 
> I'd be curious to know how many callers across the system omit
> __GFP_ZERO when allocating a user-page, and whether there might be
> scenarios where we subtly miss it (seems unlikely and narrow, but very
> possibly something a driver could do unintentionally).
> 
> ~Gregory


I'd do this on top if possible.

-- 
MST


Reply via email to