On 5/15/25 11:46, Jesse.Zhang wrote: > The current cleanup order during file descriptor close can lead to > a race condition where the eviction fence worker attempts to access > a destroyed mutex from the user queue manager: > > [ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock) > [ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564 > [ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] > > The issue occurs because: > 1. We destroy the user queue manager (including its mutex) first > 2. Then try to destroy eviction fences which may have pending work > 3. The eviction fence worker may try to access the already-destroyed mutex > > Fix this by reordering the cleanup to: > 1. First mark the fd as closing and destroy eviction fences, > which flushes any pending work > 2. Then safely destroy the user queue manager after we're certain > no more fence work will be executed > > v2: remove the copy in amdgpu_driver_postclose_kms() (Christian) > > Signed-off-by: Jesse Zhang <jesse.zh...@amd.com>
Reviewed-by: Christian König <christian.koe...@amd.com> We should probably clean that up further, but that is unecessary here. Regards, Christian. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 ----- > 2 files changed, 1 insertion(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index 4ddd08ce8885..4db92e0a60da 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2913,8 +2913,8 @@ static int amdgpu_drm_release(struct inode *inode, > struct file *filp) > > if (fpriv) { > fpriv->evf_mgr.fd_closing = true; > - amdgpu_userq_mgr_fini(&fpriv->userq_mgr); > amdgpu_eviction_fence_destroy(&fpriv->evf_mgr); > + amdgpu_userq_mgr_fini(&fpriv->userq_mgr); > } > > return drm_release(inode, filp); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > index 9fbb04aee97b..d2ce7d86dbc8 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c > @@ -1502,11 +1502,6 @@ void amdgpu_driver_postclose_kms(struct drm_device > *dev, > amdgpu_bo_unreserve(pd); > } > > - if (!fpriv->evf_mgr.fd_closing) { > - fpriv->evf_mgr.fd_closing = true; > - amdgpu_userq_mgr_fini(&fpriv->userq_mgr); > - amdgpu_eviction_fence_destroy(&fpriv->evf_mgr); > - } > amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr); > amdgpu_vm_fini(adev, &fpriv->vm); >