On 5/19/25 18:37, Tvrtko Ursulin wrote: > Amdgpu_ctx_mgr_entity_fini() walks the context IDR unlocked so question is > could it in theory see a stale entry and attempt to destroy the > drm_sched_entity twice?
No, when this function is called when the file descriptor is freed up. So there should never ever be any concurrent user of this. > Problem is I have hit this on a KASAN enabled kernel only _once_ and never > since. In that case it reported amdgpu_ctx_ioctl() had freed the entity > already which would mean the question is could we possibly go through the > mutex_unlock() on one CPU, and another CPU to follow immediately with > file->release (DRM postclose) and see the stale entry. Mhm, it would basically mean that the file descriptor can be closed while some IOCTL on it is still ongoing. I think that this would be extremely ugly and should never ever happen in the first place. Adding the mutex just band aids the issue, but not really fixes it. > Throwing it out there not to forget about it, since I have manage to > lose the KASAN trace already.. If you manage to trigger that again please send it to me ASAP. Thanks, Christian. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com> > Cc: Alex Deucher <alexander.deuc...@amd.com> > Cc: Christian König <christian.koe...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > index 85567d0d9545..95b005ed839e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c > @@ -927,6 +927,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct > amdgpu_ctx_mgr *mgr) > > idp = &mgr->ctx_handles; > > + mutex_lock(&mgr->lock); > idr_for_each_entry(idp, ctx, id) { > if (kref_read(&ctx->refcount) != 1) { > DRM_ERROR("ctx %p is still alive\n", ctx); > @@ -945,6 +946,7 @@ static void amdgpu_ctx_mgr_entity_fini(struct > amdgpu_ctx_mgr *mgr) > } > } > } > + mutex_unlock(&mgr->lock); > } > > void amdgpu_ctx_mgr_fini(struct amdgpu_ctx_mgr *mgr)