On Wed, Aug 7, 2024 at 10:25 PM Faith Ekstrand <fa...@gfxstrand.net> wrote:
> On Wed, Aug 7, 2024 at 2:23 PM Joshua Ashton <jos...@froggi.es> wrote: > >> I was thinking about this more recently. I was initially considering >> "maybe this should be a per-BO import," but I couldn't think of anything in >> the GL model that would actually benefit given its not "true" bindless and >> there's no update-after-bind there. >> > > That's also an option and it's the way it works on i915. However, then you > have to pass lists of things to the kernel and that's kinda gross. If we > need it, we can do that. Otherwise, I think a more global thing is fine. I > think Bas is currently investigating a per-submit approach which is a tad > different from either but should also work okay. > > Yeah, I'm working on a per-submit thing (also using BOOKKEEP fences instead of using the EXPLICIT wait mode to ensure we also don't add implicit fences). We do have a per-BO list on submission already, so we could add things there, it is just very annoying to implement as currently at the point we do fence wait/signal we lost the association with the BO list. Given that I don't see an use case anytime soon (there are some theoreticals like radeonsi might start doing READ usage instead of RW usage with extra tracking) I feel it isn't worth that added implementation complexity Worth others more familiar with GL asking that question to themselves also. >> I am definitely not totally up on what's possible there. >> >> Overall, I think I am OK with this approach, even though I think mixing >> implicit and explicit sync is gross, and I want the pain that is implicit >> sync to just go away forever. :-) >> > > So say we all... > > ~Faith > > > >> - Joshie 🐸✨ >> >> >> On August 7, 2024 4:39:32 PM GMT+01:00, Faith Ekstrand < >> fa...@gfxstrand.net> wrote: >> >Previously, AMDGPU_GEM_CREATE_EXPLICIT_SYNC was used to disable implicit >> >synchronization on BOs when explicit synchronization can be used. The >> >problem is that this flag is per-BO and affects all amdgpu users in the >> >system, not just the usermode drver which sets it. This can lead to >> >some unintended consequences for userspace if not used carefully. >> > >> >Since the introduction of DMA_BUF_IOCTL_EXPORT_SYNC_FILE and >> >DMA_BUF_IOCTL_IMPORT_SYNC_FILE, many userspace window system components >> >have grown the ability to convert between the Vulkan explicit sync model >> >and the legacy implicit sync model used by X11 and Wayland in the past. >> >This allows both old and new components to exist simultaneously and talk >> >to each other. In particular, XWayland is able to convert between the >> >two to let Vulkan apps work seamlessly with older X11 compositors that >> >aren't aware of explicit synchronizaiton. This is rapidly becoming the >> >backbone of synchronization in the Linux window system space. >> > >> >Unfortunately, AMDGPU_GEM_CREATE_EXPLICIT_SYNC breaks this because it >> >disables all implicit synchronization on the given BO, regardless of >> >which userspace driver is rendering with it and regardless of the >> >assumptions made by the client application. In particular, this is >> >causing issues with RADV and radeonsi. RADV sets the flag to disable >> >implicit sync on window system buffers so that it can safely have them >> >resident at all times without causing internal over-synchronization. >> >The BO is then handed off to a compositor which composites using >> >radeonsi. If the compositor uses the EGL_ANDROID_native_fence_sync >> >extension to pass explicit sync files through to radeonsi, then >> >everything is fine. However, if that buffer ever gets handed to an >> >instance of radeonsi with any assumption of implicit synchronization, >> >radeonsi won't be able sync on the BO because RADV disabled implict sync >> >on that BO system-wide. It doesn't matter whether some window system >> >component used DMA_BUF_IOCTL_IMPORT_SYNC_FILE to set the appropriate >> >fence on the BO, amdgpu will ignore it. >> > >> >This new flag disables implicit sync on the context rather than the BO. >> >This way RADV can disable implicit sync (which is what RADV has always >> >wanted) without affecting other components in the system. If RADV (or >> >some other driver) wants implicit sync on some BO, it can use >> >DMA_BUF_IOCTL_EXPORT_SYNC_FILE and DMA_BUF_IOCTL_IMPORT_SYNC_FILE to >> >manually synchronize with other implicit-sync components. This is the >> >approach we've taken with NVK/nouveau, ANV/xe, and similar to the >> >approach taken by ANV/i915 and it works well for those drivers. >> > >> >Ideally, I would like to see something like this back-ported to at least >> >the kernel where DMA_BUF_IOCTL_IMPORT/EXPORT_SYNC_FILE were introduced >> >so that we don't have to wait another year for the fix to reach users. >> >However, I understand that back-porting UAPI is problematic and I'll >> >leave that decision up to the amdgpu maintainers. Michel suggested that >> >a new CTX_OP would make more sense if we want to back-port it but the >> >create flag made more sense to me from an API design PoV. >> > >> >Signed-off-by: Faith Ekstrand <faith.ekstr...@collabora.com> >> >Cc: Alex Deucher <alexander.deuc...@amd.com> >> >Cc: Christian König <christian.koe...@amd.com> >> >Cc: David Airlie <airl...@gmail.com> >> >Cc: Michel Dänzer <mdaen...@redhat.com> >> >Cc: Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl> >> >--- >> > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++- >> > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 12 ++++++++---- >> > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 7 +++++++ >> > include/uapi/drm/amdgpu_drm.h | 12 +++++++++++- >> > 4 files changed, 28 insertions(+), 6 deletions(-) >> > >> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> >index ec888fc6ead8..8410b4426541 100644 >> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >> >@@ -1196,7 +1196,8 @@ static int amdgpu_cs_sync_rings(struct >> amdgpu_cs_parser *p) >> > struct dma_resv *resv = bo->tbo.base.resv; >> > enum amdgpu_sync_mode sync_mode; >> > >> >- sync_mode = amdgpu_bo_explicit_sync(bo) ? >> >+ sync_mode = (amdgpu_ctx_explicit_sync(p->ctx) || >> >+ amdgpu_bo_explicit_sync(bo)) ? >> > AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; >> > r = amdgpu_sync_resv(p->adev, &p->sync, resv, sync_mode, >> > &fpriv->vm); >> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> >index 5cb33ac99f70..a304740ccedf 100644 >> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >> >@@ -318,7 +318,8 @@ static int amdgpu_ctx_get_stable_pstate(struct >> amdgpu_ctx *ctx, >> > } >> > >> > static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority, >> >- struct drm_file *filp, struct amdgpu_ctx *ctx) >> >+ uint32_t flags, struct drm_file *filp, >> >+ struct amdgpu_ctx *ctx) >> > { >> > struct amdgpu_fpriv *fpriv = filp->driver_priv; >> > u32 current_stable_pstate; >> >@@ -334,6 +335,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr >> *mgr, int32_t priority, >> > ctx->mgr = mgr; >> > spin_lock_init(&ctx->ring_lock); >> > >> >+ ctx->flags = flags; >> > ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter); >> > ctx->reset_counter_query = ctx->reset_counter; >> > ctx->generation = amdgpu_vm_generation(mgr->adev, &fpriv->vm); >> >@@ -474,6 +476,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device >> *adev, >> > struct amdgpu_fpriv *fpriv, >> > struct drm_file *filp, >> > int32_t priority, >> >+ uint32_t flags, >> > uint32_t *id) >> > { >> > struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr; >> >@@ -493,7 +496,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device >> *adev, >> > } >> > >> > *id = (uint32_t)r; >> >- r = amdgpu_ctx_init(mgr, priority, filp, ctx); >> >+ r = amdgpu_ctx_init(mgr, priority, flags, filp, ctx); >> > if (r) { >> > idr_remove(&mgr->ctx_handles, *id); >> > *id = 0; >> >@@ -666,7 +669,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void >> *data, >> > struct drm_file *filp) >> > { >> > int r; >> >- uint32_t id, stable_pstate; >> >+ uint32_t id, stable_pstate, flags; >> > int32_t priority; >> > >> > union drm_amdgpu_ctx *args = data; >> >@@ -675,6 +678,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void >> *data, >> > >> > id = args->in.ctx_id; >> > priority = args->in.priority; >> >+ flags = args->in.flags; >> > >> > /* For backwards compatibility, we need to accept ioctls with >> garbage >> > * in the priority field. Garbage values in the priority field, >> result >> >@@ -685,7 +689,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void >> *data, >> > >> > switch (args->in.op) { >> > case AMDGPU_CTX_OP_ALLOC_CTX: >> >- r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id); >> >+ r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, flags, >> &id); >> > args->out.alloc.ctx_id = id; >> > break; >> > case AMDGPU_CTX_OP_FREE_CTX: >> >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h >> >index 85376baaa92f..9431c8d2ea59 100644 >> >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h >> >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h >> >@@ -45,6 +45,7 @@ struct amdgpu_ctx_entity { >> > struct amdgpu_ctx { >> > struct kref refcount; >> > struct amdgpu_ctx_mgr *mgr; >> >+ uint32_t flags; >> > unsigned reset_counter; >> > unsigned reset_counter_query; >> > uint64_t generation; >> >@@ -84,6 +85,12 @@ struct dma_fence *amdgpu_ctx_get_fence(struct >> amdgpu_ctx *ctx, >> > bool amdgpu_ctx_priority_is_valid(int32_t ctx_prio); >> > void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t >> ctx_prio); >> > >> >+static inline bool >> >+amdgpu_ctx_explicit_sync(struct amdgpu_ctx *ctx) >> >+{ >> >+ return ctx->flags & AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC; >> >+} >> >+ >> > int amdgpu_ctx_ioctl(struct drm_device *dev, void *data, >> > struct drm_file *filp); >> > >> >diff --git a/include/uapi/drm/amdgpu_drm.h >> b/include/uapi/drm/amdgpu_drm.h >> >index 96e32dafd4f0..e9d87a6e3d86 100644 >> >--- a/include/uapi/drm/amdgpu_drm.h >> >+++ b/include/uapi/drm/amdgpu_drm.h >> >@@ -125,7 +125,14 @@ extern "C" { >> > #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS (1 << 5) >> > /* Flag that BO is always valid in this VM */ >> > #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID (1 << 6) >> >-/* Flag that BO sharing will be explicitly synchronized */ >> >+/* Flag that BO sharing will be explicitly synchronized >> >+ * >> >+ * This flag should not be used unless the client can guarantee that no >> >+ * other driver which ever touches this BO will ever want to use >> implicit >> >+ * synchronization as it disables implicit sync on this BO system-wide. >> >+ * Instead, drivers which use an explicit synchronization model should >> >+ * prefer AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC. >> >+ */ >> > #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC (1 << 7) >> > /* Flag that indicates allocating MQD gart on GFX9, where the mtype >> > * for the second page onward should be set to NC. It should never >> >@@ -240,6 +247,9 @@ union drm_amdgpu_bo_list { >> > #define AMDGPU_CTX_OP_GET_STABLE_PSTATE 5 >> > #define AMDGPU_CTX_OP_SET_STABLE_PSTATE 6 >> > >> >+/* indicate that all synchronization will be explicit */ >> >+#define AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC (1<<0) >> >+ >> > /* GPU reset status */ >> > #define AMDGPU_CTX_NO_RESET 0 >> > /* this the context caused it */ >> >