xe: Strict migration policy for atomic SVM faults

Thomas Hellström Fri, 25 Apr 2025 00:18:36 -0700

On Thu, 2025-04-24 at 11:03 -0700, Matthew Brost wrote:
> On Thu, Apr 24, 2025 at 04:39:21PM +0200, Thomas Hellström wrote:
> > On Tue, 2025-04-22 at 10:04 -0700, Matthew Brost wrote:
> > > Mixing GPU and CPU atomics does not work unless a strict
> > > migration
> > > policy of GPU atomics must be device memory. Enforce a policy of
> > > must
> > > be
> > > in VRAM with a retry loop of 2 attempts, if retry loop fails
> > > abort
> > > fault.
> > > 
> > > v2:
> > >  - Only retry migration on atomics
> > >  - Drop alway migrate modparam
> > > v3:
> > >  - Only set vram_only on DGFX (Himal)
> > >  - Bail on get_pages failure if vram_only and retry count
> > > exceeded
> > > (Himal)
> > >  - s/vram_only/devmem_only
> > >  - Update xe_svm_range_is_valid to accept devmem_only argument
> > > v4:
> > >  - Fix logic bug get_pages failure
> > > 
> > > Signed-off-by: Himal Prasad Ghimiray
> > > <himal.prasad.ghimi...@intel.com>
> > > Signed-off-by: Matthew Brost <matthew.br...@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_module.c |  3 --
> > >  drivers/gpu/drm/xe/xe_module.h |  1 -
> > >  drivers/gpu/drm/xe/xe_svm.c    | 89 +++++++++++++++++++++++++---
> > > ----
> > > --
> > >  drivers/gpu/drm/xe/xe_svm.h    |  5 --
> > >  4 files changed, 65 insertions(+), 33 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_module.c
> > > b/drivers/gpu/drm/xe/xe_module.c
> > > index 05c7d0ae6d83..1c4dfafbcd0b 100644
> > > --- a/drivers/gpu/drm/xe/xe_module.c
> > > +++ b/drivers/gpu/drm/xe/xe_module.c
> > > @@ -33,9 +33,6 @@ struct xe_modparam xe_modparam = {
> > >  module_param_named(svm_notifier_size,
> > > xe_modparam.svm_notifier_size,
> > > uint, 0600);
> > >  MODULE_PARM_DESC(svm_notifier_size, "Set the svm notifier
> > > size(in
> > > MiB), must be power of 2");
> > >  
> > > -module_param_named(always_migrate_to_vram,
> > > xe_modparam.always_migrate_to_vram, bool, 0444);
> > > -MODULE_PARM_DESC(always_migrate_to_vram, "Always migrate to VRAM
> > > on
> > > GPU fault");
> > > -
> >  module_param_named_unsafe(force_execlist,
> > > xe_modparam.force_execlist, bool, 0444);
> > >  MODULE_PARM_DESC(force_execlist, "Force Execlist submission");
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_module.h
> > > b/drivers/gpu/drm/xe/xe_module.h
> > > index 84339e509c80..5a3bfea8b7b4 100644
> > > --- a/drivers/gpu/drm/xe/xe_module.h
> > > +++ b/drivers/gpu/drm/xe/xe_module.h
> > > @@ -12,7 +12,6 @@
> > >  struct xe_modparam {
> > >   bool force_execlist;
> > >   bool probe_display;
> > > - bool always_migrate_to_vram;
> > >   u32 force_vram_bar_size;
> > >   int guc_log_level;
> > >   char *guc_firmware_path;
> > > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > > b/drivers/gpu/drm/xe/xe_svm.c
> > > index 890f6b2f40e9..f749ae367a8f 100644
> > > --- a/drivers/gpu/drm/xe/xe_svm.c
> > > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > > @@ -650,9 +650,11 @@ void xe_svm_fini(struct xe_vm *vm)
> > >  }
> > >  
> > >  static bool xe_svm_range_is_valid(struct xe_svm_range *range,
> > > -                           struct xe_tile *tile)
> > > +                           struct xe_tile *tile,
> > > +                           bool devmem_only)
> > >  {
> > > - return (range->tile_present & ~range->tile_invalidated)
> > > &
> > > BIT(tile->id);
> > > + return ((range->tile_present & ~range->tile_invalidated)
> > > &
> > > BIT(tile->id))
> > > +         && (!devmem_only || range-
> > > > base.flags.migrate_devmem);
> > >  }
> > 
> > So let's say devmem_only is true here, and range-
> > > base.flags.migrate_devmem is false. Wouldn't that mean the range
> > > is
> > unusable and needs to be freed and re-allocated?
> > 
> 
> This is typo, this should be s/migrate_devmem/has_devmem_pages.
> 
> This translates to:
> 
> Either devmem_only is not required or we have devmem pages with a
> valid mapping.
> 
> If migrate_devmem is false and devmem_only is true, that is a fatal
> error actually, we should have check for that and kill the fault. An
> example of this would be shared mapping which cannot be migrated to
> devmem.
>  
> > Also another thing going back to older code, it seems like range-
> > > tile_invalidated is protected by the notifier lock, so shouldn't
> > > we
> > assert that to be held in the function? It seems not to be held
> > further
> > below:
> 
> Yea techincally to get a stable value we'd need the notifier lock but
> this is an opportunistic check - at worst if we read a valid range we
> skip the page faults and will immediately get another page fault. So
> we
> could take the notifier lock here but I don't think this is strickly
> required. Let me know what you think here.


The problem with this is that the code gets harder to maintain and
understand. A new reader would probably first react over the lockless
read, and then why there are no memory barriers and then what happens
if the page-fault was marked as resolved without actually resolving it.

So IMO if we do opportunistic tests to opt out of locking (which is
discouraged in the drm locking guidelines
https://blog.ffwll.ch/2022/08/locking-hierarchy.html)
we should definitely add separate functions for that with extensive
docs and READ_ONCE() annotation.

But also think if this is really worth sacrificing readability instead
of actually relying on alloc_vram() and get_pages() exiting early if
everything looks ok?

> 
> > 
> > >  
> > >  #if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > > @@ -726,6 +728,35 @@ static int xe_svm_alloc_vram(struct xe_vm
> > > *vm,
> > > struct xe_tile *tile,
> > >  }
> > >  #endif
> > >  
> > > +static bool supports_4K_migration(struct xe_device *xe)
> > > +{
> > > + if (xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K)
> > > +         return false;
> > > +
> > > + return true;
> > > +}
> > 
> > Do we have any hardware that supports pagefaults but not 4K VRAM
> > pages?
> > 
> 
> PVC

OK, I was under the impression that PVC actually supported 4K pages.
But perhaps there was a bug encountered while implementing that.


> 
> > > +
> > > +static bool xe_svm_range_needs_migrate_to_vram(struct
> > > xe_svm_range
> > > *range,
> > > +                                        struct xe_vma
> > > *vma)
> > > +{
> > > + struct xe_vm *vm = range_to_vm(&range->base);
> > > + u64 range_size = xe_svm_range_size(range);
> > > +
> > > + if (!range->base.flags.migrate_devmem)
> > > +         return false;
> > > +
> > > + if (xe_svm_range_in_vram(range)) {
> > > +         drm_dbg(&vm->xe->drm, "Range is already in
> > > VRAM\n");
> > > +         return false;
> > > + }
> > > +
> > > + if (range_size <= SZ_64K && !supports_4K_migration(vm-
> > > >xe))
> > > {
> > > +         drm_dbg(&vm->xe->drm, "Platform doesn't support
> > > SZ_4K range migration\n");
> > > +         return false;
> > > + }
> > > +
> > > + return true;
> > > +}
> > >  
> > >  /**
> > >   * xe_svm_handle_pagefault() - SVM handle page fault
> > > @@ -750,12 +781,15 @@ int xe_svm_handle_pagefault(struct xe_vm
> > > *vm,
> > > struct xe_vma *vma,
> > >                   IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> > >           .check_pages_threshold = IS_DGFX(vm->xe) &&
> > >                   IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > > ?
> > > SZ_64K : 0,
> > > +         .devmem_only = atomic && IS_DGFX(vm->xe) &&
> > > +                 IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> > >   };
> > >   struct xe_svm_range *range;
> > >   struct drm_gpusvm_range *r;
> > >   struct drm_exec exec;
> > >   struct dma_fence *fence;
> > >   struct xe_tile *tile = gt_to_tile(gt);
> > > + int migrate_try_count = ctx.devmem_only ? 3 : 1;
> > >   ktime_t end = 0;
> > >   int err;
> > >  
> > > @@ -777,23 +811,26 @@ int xe_svm_handle_pagefault(struct xe_vm
> > > *vm,
> > > struct xe_vma *vma,
> > >           return PTR_ERR(r);
> > >  
> > >   range = to_xe_range(r);
> > > - if (xe_svm_range_is_valid(range, tile))
> > > + if (xe_svm_range_is_valid(range, tile, ctx.devmem_only))
> > 
> > Requires notifier lock. Also see comment on re-allocating the range
> > above.
> > 
> 
> Same as above.
> 
> > >           return 0;
> > >  
> > >   range_debug(range, "PAGE FAULT");
> > >  
> > > - /* XXX: Add migration policy, for now migrate range once
> > > */
> > > - if (!range->skip_migrate && range-
> > > >base.flags.migrate_devmem
> > > &&
> > > -     xe_svm_range_size(range) >= SZ_64K) {
> > > -         range->skip_migrate = true;
> > > -
> > > + if (--migrate_try_count >= 0 &&
> > > +     xe_svm_range_needs_migrate_to_vram(range, vma)
> > 
> > Requires notifier lock.
> > 
> 
> Same as above.
> 
> > Should we have some sort of timeout instead of a try-count? Perhaps
> > as
> > a last resort fall back to a 4K range?
> > 
> 
> I did have code like that at one point to reduce range size but it is
> a
> bit complicated as we'd have to remove the range... I'd rather stick
> with the retry loop for now and if this becomes problematic, circle
> back
> to reducing the size of the fault page on each retry loop.

OK, makes sense.

/Thomas


> 
> Matt
> 
> > /Thomas
> > 
> > 
> > 
> > > ) {
> > >           err = xe_svm_alloc_vram(vm, tile, range, &ctx);
> > >           if (err) {
> > > -                 drm_dbg(&vm->xe->drm,
> > > -                         "VRAM allocation failed, falling
> > > back to "
> > > -                         "retrying fault, asid=%u,
> > > errno=%pe\n",
> > > -                         vm->usm.asid, ERR_PTR(err));
> > > -                 goto retry;
> > > +                 if (migrate_try_count ||
> > > !ctx.devmem_only) {
> > > +                         drm_dbg(&vm->xe->drm,
> > > +                                 "VRAM allocation failed,
> > > falling back to retrying fault, asid=%u, errno=%pe\n",
> > > +                                 vm->usm.asid,
> > > ERR_PTR(err));
> > > +                         goto retry;
> > > +                 } else {
> > > +                         drm_err(&vm->xe->drm,
> > > +                                 "VRAM allocation failed,
> > > retry count exceeded, asid=%u, errno=%pe\n",
> > > +                                 vm->usm.asid,
> > > ERR_PTR(err));
> > > +                         return err;
> > > +                 }
> > >           }
> > >   }
> > >  
> > > @@ -801,15 +838,22 @@ int xe_svm_handle_pagefault(struct xe_vm
> > > *vm,
> > > struct xe_vma *vma,
> > >   err = drm_gpusvm_range_get_pages(&vm->svm.gpusvm, r,
> > > &ctx);
> > >   /* Corner where CPU mappings have changed */
> > >   if (err == -EOPNOTSUPP || err == -EFAULT || err == -
> > > EPERM) {
> > > -         if (err == -EOPNOTSUPP) {
> > > -                 range_debug(range, "PAGE FAULT - EVICT
> > > PAGES");
> > > -                 drm_gpusvm_range_evict(&vm->svm.gpusvm,
> > > &range->base);
> > > +         if (migrate_try_count > 0 || !ctx.devmem_only) {
> > > +                 if (err == -EOPNOTSUPP) {
> > > +                         range_debug(range, "PAGE FAULT -
> > > EVICT PAGES");
> > > +                         drm_gpusvm_range_evict(&vm-
> > > > svm.gpusvm,
> > > +                                                &range-
> > > > base);
> > > +                 }
> > > +                 drm_dbg(&vm->xe->drm,
> > > +                         "Get pages failed, falling back
> > > to
> > > retrying, asid=%u, gpusvm=%p, errno=%pe\n",
> > > +                         vm->usm.asid, &vm->svm.gpusvm,
> > > ERR_PTR(err));
> > > +                 range_debug(range, "PAGE FAULT - RETRY
> > > PAGES");
> > > +                 goto retry;
> > > +         } else {
> > > +                 drm_err(&vm->xe->drm,
> > > +                         "Get pages failed, retry count
> > > exceeded, asid=%u, gpusvm=%p, errno=%pe\n",
> > > +                         vm->usm.asid, &vm->svm.gpusvm,
> > > ERR_PTR(err));
> > >           }
> > > -         drm_dbg(&vm->xe->drm,
> > > -                 "Get pages failed, falling back to
> > > retrying,
> > > asid=%u, gpusvm=%p, errno=%pe\n",
> > > -                 vm->usm.asid, &vm->svm.gpusvm,
> > > ERR_PTR(err));
> > > -         range_debug(range, "PAGE FAULT - RETRY PAGES");
> > > -         goto retry;
> > >   }
> > >   if (err) {
> > >           range_debug(range, "PAGE FAULT - FAIL PAGE
> > > COLLECT");
> > > @@ -843,9 +887,6 @@ int xe_svm_handle_pagefault(struct xe_vm *vm,
> > > struct xe_vma *vma,
> > >   }
> > >   drm_exec_fini(&exec);
> > >  
> > > - if (xe_modparam.always_migrate_to_vram)
> > > -         range->skip_migrate = false;
> > > -
> > >   dma_fence_wait(fence, false);
> > >   dma_fence_put(fence);
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_svm.h
> > > b/drivers/gpu/drm/xe/xe_svm.h
> > > index 3d441eb1f7ea..0e1f376a7471 100644
> > > --- a/drivers/gpu/drm/xe/xe_svm.h
> > > +++ b/drivers/gpu/drm/xe/xe_svm.h
> > > @@ -39,11 +39,6 @@ struct xe_svm_range {
> > >    * range. Protected by GPU SVM notifier lock.
> > >    */
> > >   u8 tile_invalidated;
> > > - /**
> > > -  * @skip_migrate: Skip migration to VRAM, protected by
> > > GPU
> > > fault handler
> > > -  * locking.
> > > -  */
> > > - u8 skip_migrate :1;
> > >  };
> > >  
> > >  /**
> >

Re: [PATCH v4 2/5] drm/xe: Strict migration policy for atomic SVM faults

Reply via email to