Re: [RFC] amdgpu: Add a context flag to disable implicit sync

Joshua Ashton Wed, 07 Aug 2024 12:38:30 -0700

I was thinking about this more recently. I was initially considering "maybe 
this should be a per-BO import," but I couldn't think of anything in the GL 
model that would actually benefit given its not "true" bindless and there's no 
update-after-bind there.


Worth others more familiar with GL asking that question to themselves also. I 
am definitely not totally up on what's possible there.

Overall, I think I am OK with this approach, even though I think mixing 
implicit and explicit sync is gross, and I want the pain that is implicit sync 
to just go away forever. :-)

- Joshie 🐸✨


On August 7, 2024 4:39:32 PM GMT+01:00, Faith Ekstrand <fa...@gfxstrand.net> 
wrote:
>Previously, AMDGPU_GEM_CREATE_EXPLICIT_SYNC was used to disable implicit
>synchronization on BOs when explicit synchronization can be used.  The
>problem is that this flag is per-BO and affects all amdgpu users in the
>system, not just the usermode drver which sets it.  This can lead to
>some unintended consequences for userspace if not used carefully.
>
>Since the introduction of DMA_BUF_IOCTL_EXPORT_SYNC_FILE and
>DMA_BUF_IOCTL_IMPORT_SYNC_FILE, many userspace window system components
>have grown the ability to convert between the Vulkan explicit sync model
>and the legacy implicit sync model used by X11 and Wayland in the past.
>This allows both old and new components to exist simultaneously and talk
>to each other.  In particular, XWayland is able to convert between the
>two to let Vulkan apps work seamlessly with older X11 compositors that
>aren't aware of explicit synchronizaiton.  This is rapidly becoming the
>backbone of synchronization in the Linux window system space.
>
>Unfortunately, AMDGPU_GEM_CREATE_EXPLICIT_SYNC breaks this because it
>disables all implicit synchronization on the given BO, regardless of
>which userspace driver is rendering with it and regardless of the
>assumptions made by the client application.  In particular, this is
>causing issues with RADV and radeonsi.  RADV sets the flag to disable
>implicit sync on window system buffers so that it can safely have them
>resident at all times without causing internal over-synchronization.
>The BO is then handed off to a compositor which composites using
>radeonsi.  If the compositor uses the EGL_ANDROID_native_fence_sync
>extension to pass explicit sync files through to radeonsi, then
>everything is fine.  However, if that buffer ever gets handed to an
>instance of radeonsi with any assumption of implicit synchronization,
>radeonsi won't be able sync on the BO because RADV disabled implict sync
>on that BO system-wide.  It doesn't matter whether some window system
>component used DMA_BUF_IOCTL_IMPORT_SYNC_FILE to set the appropriate
>fence on the BO, amdgpu will ignore it.
>
>This new flag disables implicit sync on the context rather than the BO.
>This way RADV can disable implicit sync (which is what RADV has always
>wanted) without affecting other components in the system.  If RADV (or
>some other driver) wants implicit sync on some BO, it can use
>DMA_BUF_IOCTL_EXPORT_SYNC_FILE and DMA_BUF_IOCTL_IMPORT_SYNC_FILE to
>manually synchronize with other implicit-sync components.  This is the
>approach we've taken with NVK/nouveau, ANV/xe, and similar to the
>approach taken by ANV/i915 and it works well for those drivers.
>
>Ideally, I would like to see something like this back-ported to at least
>the kernel where DMA_BUF_IOCTL_IMPORT/EXPORT_SYNC_FILE were introduced
>so that we don't have to wait another year for the fix to reach users.
>However, I understand that back-porting UAPI is problematic and I'll
>leave that decision up to the amdgpu maintainers.  Michel suggested that
>a new CTX_OP would make more sense if we want to back-port it but the
>create flag made more sense to me from an API design PoV.
>
>Signed-off-by: Faith Ekstrand <faith.ekstr...@collabora.com>
>Cc: Alex Deucher <alexander.deuc...@amd.com>
>Cc: Christian König <christian.koe...@amd.com>
>Cc: David Airlie <airl...@gmail.com>
>Cc: Michel Dänzer <mdaen...@redhat.com>
>Cc: Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  3 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 12 ++++++++----
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  7 +++++++
> include/uapi/drm/amdgpu_drm.h           | 12 +++++++++++-
> 4 files changed, 28 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>index ec888fc6ead8..8410b4426541 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>@@ -1196,7 +1196,8 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser 
>*p)
>               struct dma_resv *resv = bo->tbo.base.resv;
>               enum amdgpu_sync_mode sync_mode;
> 
>-              sync_mode = amdgpu_bo_explicit_sync(bo) ?
>+              sync_mode = (amdgpu_ctx_explicit_sync(p->ctx) ||
>+                           amdgpu_bo_explicit_sync(bo)) ?
>                       AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER;
>               r = amdgpu_sync_resv(p->adev, &p->sync, resv, sync_mode,
>                                    &fpriv->vm);
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>index 5cb33ac99f70..a304740ccedf 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>@@ -318,7 +318,8 @@ static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx 
>*ctx,
> }
> 
> static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority,
>-                         struct drm_file *filp, struct amdgpu_ctx *ctx)
>+                         uint32_t flags, struct drm_file *filp,
>+                         struct amdgpu_ctx *ctx)
> {
>       struct amdgpu_fpriv *fpriv = filp->driver_priv;
>       u32 current_stable_pstate;
>@@ -334,6 +335,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, 
>int32_t priority,
>       ctx->mgr = mgr;
>       spin_lock_init(&ctx->ring_lock);
> 
>+      ctx->flags = flags;
>       ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter);
>       ctx->reset_counter_query = ctx->reset_counter;
>       ctx->generation = amdgpu_vm_generation(mgr->adev, &fpriv->vm);
>@@ -474,6 +476,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>                           struct amdgpu_fpriv *fpriv,
>                           struct drm_file *filp,
>                           int32_t priority,
>+                          uint32_t flags,
>                           uint32_t *id)
> {
>       struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>@@ -493,7 +496,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>       }
> 
>       *id = (uint32_t)r;
>-      r = amdgpu_ctx_init(mgr, priority, filp, ctx);
>+      r = amdgpu_ctx_init(mgr, priority, flags, filp, ctx);
>       if (r) {
>               idr_remove(&mgr->ctx_handles, *id);
>               *id = 0;
>@@ -666,7 +669,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>                    struct drm_file *filp)
> {
>       int r;
>-      uint32_t id, stable_pstate;
>+      uint32_t id, stable_pstate, flags;
>       int32_t priority;
> 
>       union drm_amdgpu_ctx *args = data;
>@@ -675,6 +678,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
> 
>       id = args->in.ctx_id;
>       priority = args->in.priority;
>+      flags = args->in.flags;
> 
>       /* For backwards compatibility, we need to accept ioctls with garbage
>        * in the priority field. Garbage values in the priority field, result
>@@ -685,7 +689,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
> 
>       switch (args->in.op) {
>       case AMDGPU_CTX_OP_ALLOC_CTX:
>-              r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id);
>+              r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, flags, &id);
>               args->out.alloc.ctx_id = id;
>               break;
>       case AMDGPU_CTX_OP_FREE_CTX:
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>index 85376baaa92f..9431c8d2ea59 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
>@@ -45,6 +45,7 @@ struct amdgpu_ctx_entity {
> struct amdgpu_ctx {
>       struct kref                     refcount;
>       struct amdgpu_ctx_mgr           *mgr;
>+      uint32_t                        flags;
>       unsigned                        reset_counter;
>       unsigned                        reset_counter_query;
>       uint64_t                        generation;
>@@ -84,6 +85,12 @@ struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx 
>*ctx,
> bool amdgpu_ctx_priority_is_valid(int32_t ctx_prio);
> void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t ctx_prio);
> 
>+static inline bool
>+amdgpu_ctx_explicit_sync(struct amdgpu_ctx *ctx)
>+{
>+      return ctx->flags & AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC;
>+}
>+
> int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>                    struct drm_file *filp);
> 
>diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>index 96e32dafd4f0..e9d87a6e3d86 100644
>--- a/include/uapi/drm/amdgpu_drm.h
>+++ b/include/uapi/drm/amdgpu_drm.h
>@@ -125,7 +125,14 @@ extern "C" {
> #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS     (1 << 5)
> /* Flag that BO is always valid in this VM */
> #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID     (1 << 6)
>-/* Flag that BO sharing will be explicitly synchronized */
>+/* Flag that BO sharing will be explicitly synchronized
>+ *
>+ * This flag should not be used unless the client can guarantee that no
>+ * other driver which ever touches this BO will ever want to use implicit
>+ * synchronization as it disables implicit sync on this BO system-wide.
>+ * Instead, drivers which use an explicit synchronization model should
>+ * prefer AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC.
>+ */
> #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC               (1 << 7)
> /* Flag that indicates allocating MQD gart on GFX9, where the mtype
>  * for the second page onward should be set to NC. It should never
>@@ -240,6 +247,9 @@ union drm_amdgpu_bo_list {
> #define AMDGPU_CTX_OP_GET_STABLE_PSTATE       5
> #define AMDGPU_CTX_OP_SET_STABLE_PSTATE       6
> 
>+/* indicate that all synchronization will be explicit */
>+#define AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC (1<<0)
>+
> /* GPU reset status */
> #define AMDGPU_CTX_NO_RESET           0
> /* this the context caused it */

Re: [RFC] amdgpu: Add a context flag to disable implicit sync

Reply via email to