On 5/15/25 11:46, Jesse.Zhang wrote:
> The current cleanup order during file descriptor close can lead to
> a race condition where the eviction fence worker attempts to access
> a destroyed mutex from the user queue manager:
> 
> [  517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> [  517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564
> [  517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu]
> 
> The issue occurs because:
> 1. We destroy the user queue manager (including its mutex) first
> 2. Then try to destroy eviction fences which may have pending work
> 3. The eviction fence worker may try to access the already-destroyed mutex
> 
> Fix this by reordering the cleanup to:
> 1. First mark the fd as closing and destroy eviction fences,
>    which flushes any pending work
> 2. Then safely destroy the user queue manager after we're certain
>    no more fence work will be executed
> 
> v2: remove the copy in amdgpu_driver_postclose_kms() (Christian)
> 
> Signed-off-by: Jesse Zhang <jesse.zh...@amd.com>

Reviewed-by: Christian König <christian.koe...@amd.com>

We should probably clean that up further, but that is unecessary here.

Regards,
Christian.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 -----
>  2 files changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 4ddd08ce8885..4db92e0a60da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2913,8 +2913,8 @@ static int amdgpu_drm_release(struct inode *inode, 
> struct file *filp)
>  
>       if (fpriv) {
>               fpriv->evf_mgr.fd_closing = true;
> -             amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>               amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
> +             amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
>       }
>  
>       return drm_release(inode, filp);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 9fbb04aee97b..d2ce7d86dbc8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1502,11 +1502,6 @@ void amdgpu_driver_postclose_kms(struct drm_device 
> *dev,
>               amdgpu_bo_unreserve(pd);
>       }
>  
> -     if (!fpriv->evf_mgr.fd_closing) {
> -             fpriv->evf_mgr.fd_closing = true;
> -             amdgpu_userq_mgr_fini(&fpriv->userq_mgr);
> -             amdgpu_eviction_fence_destroy(&fpriv->evf_mgr);
> -     }
>       amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr);
>       amdgpu_vm_fini(adev, &fpriv->vm);
>  

Reply via email to