pool: back up at native page order

Matthew Brost Tue, 05 May 2026 10:37:15 -0700

On Tue, May 05, 2026 at 11:02:35AM +0200, Thomas Hellström wrote:
> On Mon, 2026-05-04 at 20:30 -0700, Matthew Brost wrote:
> > ttm_pool_split_for_swap() splits high-order pool pages into order-0
> > pages during backup so each 4K page can be released to the system as
> > soon as it has been written to shmem. While this minimizes the
> > allocator's working set during reclaim, it actively fragments memory:
> > every TTM-backed compound page that the shrinker touches is shattered
> > into order-0 pages, even when the rest of the system would prefer
> > that
> > the high-order block stay intact. Under sustained kswapd pressure
> > this
> > is enough to drive other parts of MM into recovery loops from which
> > they cannot easily escape, because the memory TTM just freed is no
> > longer contiguous.
> > 
> > Stop unconditionally splitting on the backup path and back up each
> > compound at its native order in ttm_pool_backup():
> > 
> >   - For each non-handle slot, read the order from the head page and
> >     back up all 1<<order subpages to consecutive shmem indices,
> >     writing the resulting handles into tt->pages[] as we go.
> >   - On success, the compound is freed once at its native order. No
> >     split_page(), no per-4K refcount juggling, no fragmentation
> >     introduced from this path.
> >   - Slots that already hold a backup handle from a previous partial
> >     attempt are skipped. A compound that would extend past a
> >     fault-injection-truncated num_pages is skipped rather than split.
> > 
> > A per-subpage backup failure cannot be made fully atomic: backing up
> > a
> > subpage allocates a shmem folio before the source page can be
> > released,
> > so under true OOM any subpage in a compound (not just the first) may
> > fail to be backed up with the rest of the source compound still live
> > and contiguous. To make forward progress in that case, fall back to
> > splitting the source compound and backing up its remaining subpages
> > individually:
> > 
> >   - On the first per-subpage failure for a compound (and only if
> >     order > 0), call ttm_pool_split_for_swap() to split the source
> >     compound, release the subpages whose contents already live in
> >     shmem (their handles in tt->pages stay valid), and retry the
> >     failing subpage at order 0.
> >   - Subsequent successful subpage backups in the now-split compound
> >     free their source page individually as soon as the handle is
> >     written.
> >   - A second failure after splitting terminates the loop with partial
> >     progress; the remaining order-0 subpages stay in tt->pages as
> >     plain page pointers and are cleaned up by the normal
> >     ttm_pool_drop_backed_up() / ttm_pool_free_range() paths.
> > 
> > This restores the original split-on-OOM fallback behavior while
> > keeping the common, non-OOM case fragmentation-free. It also
> > preserves the "partial backup is allowed" contract: shrunken is
> > incremented per backed-up subpage so the caller still sees forward
> > progress when a compound only partially succeeds.
> > 
> > The restore-side leftover-page branch in ttm_pool_restore_commit() is
> > left as-is for now: that path can still split a previously-retained
> > compound, but in practice it is unreachable under realistic workloads
> > (per profiling we have not been able to trigger it), so it is not
> > worth complicating the restore state machine to avoid the split
> > there.
> > If it ever becomes a problem in practice it can be addressed
> > independently.
> > 
> > ttm_pool_split_for_swap() itself is retained both for the OOM
> > fallback above and for the restore path's remaining caller. The
> > DMA-mapped pre-backup unmap loop, the purge path, ttm_pool_free_*,
> > and ttm_pool_unmap_and_free() already operate at native order and
> > are unchanged.
> > 
> > Cc: Christian Koenig <[email protected]>
> > Cc: Huang Rui <[email protected]>
> > Cc: Matthew Auld <[email protected]>
> > Cc: Maarten Lankhorst <[email protected]>
> > Cc: Maxime Ripard <[email protected]>
> > Cc: Thomas Zimmermann <[email protected]>
> > Cc: David Airlie <[email protected]>
> > Cc: Simona Vetter <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Fixes: b63d715b8090 ("drm/ttm/pool, drm/ttm/tt: Provide a helper to
> > shrink pages")
> > Suggested-by: Thomas Hellström <[email protected]>
> > Assisted-by: Claude:claude-opus-4.6
> > Signed-off-by: Matthew Brost <[email protected]>
> > 
> > ---
> > 
> > A follow-up should attempt writeback to shmem at folio order as well,
> > but the API for doing so is unclear and may be incomplete.
> > 
> > This patch is related to the pending series [1] and significantly
> > reduces the likelihood of Xe entering a kswapd loop under
> > fragmentation.
> > The kswapd → shrinker → Xe shrinker → TTM backup path is still
> > exercised; however, with this change the backup path no longer
> > worsens
> > fragmentation, which previously amplified reclaim pressure and
> > reinforced the kswapd loop.
> > 
> > Nonetheless, the pathological case that [1] aims to address still
> > exists
> > and requires a proper solution. Even with this patch, a kswapd loop
> > due
> > to severe fragmentation can still be triggered, although it is now
> > substantially harder to reproduce.
> > 
> > v2:
> >  - Split pages and free immediately if backup fails are higher order
> >    (Thomas)
> > v3:
> >  - Skip handles in purge path (sashiko)
> > 
> > [1] https://patchwork.freedesktop.org/series/165330/
> > ---
> >  drivers/gpu/drm/ttm/ttm_pool.c | 87 ++++++++++++++++++++++++++++----
> > --
> >  1 file changed, 72 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index c7aab60b7f01..f9e631a20979 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -1047,12 +1047,11 @@ long ttm_pool_backup(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >  {
> >     struct file *backup = tt->backup;
> >     struct page *page;
> > -   unsigned long handle;
> >     gfp_t alloc_gfp;
> >     gfp_t gfp;
> >     int ret = 0;
> >     pgoff_t shrunken = 0;
> > -   pgoff_t i, num_pages;
> > +   pgoff_t i, num_pages, npages;
> >  
> >     if (WARN_ON(ttm_tt_is_backed_up(tt)))
> >             return -EINVAL;
> > @@ -1072,7 +1071,8 @@ long ttm_pool_backup(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >                     unsigned int order;
> >  
> >                     page = tt->pages[i];
> > -                   if (unlikely(!page)) {
> > +                   if (unlikely(!page ||
> > +                               
> > ttm_backup_page_ptr_is_handle(page))) {
> >                             num_pages = 1;
> >                             continue;
> >                     }
> > @@ -1108,28 +1108,85 @@ long ttm_pool_backup(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >     if (IS_ENABLED(CONFIG_FAULT_INJECTION) &&
> > should_fail(&backup_fault_inject, 1))
> >             num_pages = DIV_ROUND_UP(num_pages, 2);
> >  
> > -   for (i = 0; i < num_pages; ++i) {
> > -           s64 shandle;
> > +   for (i = 0; i < num_pages; i += npages) {
> > +           unsigned int order;
> > +           pgoff_t j;
> > +           bool folio_has_been_split = false;
> >  
> > +           npages = 1;
> >             page = tt->pages[i];
> >             if (unlikely(!page))
> >                     continue;
> >  
> > -           ttm_pool_split_for_swap(pool, page);
> > +           /* Already-handled entry from a previous attempt. */
> > +           if (unlikely(ttm_backup_page_ptr_is_handle(page)))
> > +                   continue;
> > +
> > +           order = ttm_pool_page_order(pool, page);
> > +           npages = 1UL << order;
> >  
> > -           shandle = ttm_backup_backup_page(backup, page,
> > flags->writeback, i,
> > -                                            gfp, alloc_gfp);
> > -           if (shandle < 0) {
> > -                   /* We allow partially shrunken tts */
> > -                   ret = shandle;
> > +           /*
> > +            * Back up the compound atomically at its native
> > order. If
> > +            * fault injection truncated num_pages mid-compound,
> > skip
> > +            * the partial tail rather than splitting.
> > +            */
> > +           if (unlikely(i + npages > num_pages))
> >                     break;
> > +
> > +           for (j = 0; j < npages; ++j) {
> > +                   s64 shandle;
> 
> I still think we should move part of this loop to
> ttm_backup_backup_folio() at this point, rather than open-coding it
> here. It's the design we want to move forward with and would probably
> make the pool code cleaner as well. If we think failures would be
> common we could have ttm_backup_backup_folio() return the number of
> pages that were actually backed up or error Otherwise just return
> success or error and on error truncate the shmem pages that were
> already copied.
>


Yes, for now I think helper should be ttm_pool layer as it relies a
several other things in ttm_pool.c that at for fixes patch I don't want
to shuffle around. So ttm_pool_backup_folio I think.

Matt

> Thanks,
> Thomas
> 
> 
> > +
> > +try_again_after_split:
> > +                   if (IS_ENABLED(CONFIG_FAULT_INJECTION) &&
> > +                       should_fail(&backup_fault_inject, 1))
> > +                           shandle = -ENOMEM;
> > +                   else
> > +                           shandle =
> > ttm_backup_backup_page(backup, page + j,
> > +                                                           
> > flags->writeback,
> > +                                                            i +
> > j, gfp,
> > +                                                           
> > alloc_gfp);
> > +
> > +                   if (shandle < 0 && !folio_has_been_split &&
> > order) {
> > +                           pgoff_t k;
> > +
> > +                           /*
> > +                            * True OOM: could not allocate a
> > shmem folio
> > +                            * for the next subpage. Fall back
> > to splitting
> > +                            * the source compound and backing
> > up subpages
> > +                            * individually. Release the
> > already-backed-up
> > +                            * subpages whose contents now live
> > in shmem;
> > +                            * any further failure terminates
> > the loop with
> > +                            * partial progress (handled by the
> > caller).
> > +                            */
> > +                           folio_has_been_split = true;
> > +                           ttm_pool_split_for_swap(pool, page);
> > +
> > +                           for (k = 0; k < j; ++k) {
> > +                                   __free_pages_gpu_account(pag
> > e + k, 0, false);
> > +                                   shrunken++;
> > +                           }
> > +
> > +                           goto try_again_after_split;
> > +                   } else if (shandle < 0) {
> > +                           ret = shandle;
> > +                           goto out;
> > +                   } else if (folio_has_been_split) {
> > +                           __free_pages_gpu_account(page + j,
> > 0, false);
> > +                           shrunken++;
> > +                   }
> > +
> > +                   tt->pages[i + j] =
> > ttm_backup_handle_to_page_ptr(shandle);
> > +           }
> > +
> > +           if (!folio_has_been_split) {
> > +                   /* Compound fully backed up; free at native
> > order. */
> > +                   page->private = 0;
> > +                   __free_pages_gpu_account(page, order,
> > false);
> > +                   shrunken += npages;
> >             }
> > -           handle = shandle;
> > -           tt->pages[i] =
> > ttm_backup_handle_to_page_ptr(handle);
> > -           __free_pages_gpu_account(page, 0, false);
> > -           shrunken++;
> >     }
> >  
> > +out:
> >     return shrunken ? shrunken : ret;
> >  }
> >

Re: [PATCH v4 2/2] drm/ttm/pool: back up at native page order

Reply via email to