This commit fixes a potential race condition in the userqueue fence signaling mechanism by replacing dma_fence_is_signaled_locked() with dma_fence_is_signaled().
The issue occurred because: 1. dma_fence_is_signaled_locked() should only be used when holding the fence's individual lock, not just the fence list lock 2. Using the locked variant without the proper fence lock could lead to double-signaling scenarios: - Hardware completion signals the fence - Software path also tries to signal the same fence By using dma_fence_is_signaled() instead, we properly handle the locking hierarchy and avoid the race condition while still maintaining the necessary synchronization through the fence_list_lock. Signed-off-by: Jesse Zhang <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index 59d0abbdfc2f..ac366e1bc9ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -289,7 +289,16 @@ static int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq, /* Check if hardware has already processed the job */ spin_lock_irqsave(&fence_drv->fence_list_lock, flags); - if (!dma_fence_is_signaled_locked(fence)) + /* + * Use dma_fence_is_signaled() instead of dma_fence_is_signaled_locked() + * because we only hold the fence_list_lock, not the individual fence lock. + * dma_fence_is_signaled_locked() should only be used when the fence's + * own lock is held, which isn't the case here. Using the locked variant + * can lead to race conditions where the fence gets signaled twice: + * 1. Hardware completion signals the fence + * 2. Software path also tries to signal the same fence + */ + if (!dma_fence_is_signaled(fence)) list_add_tail(&userq_fence->link, &fence_drv->fences); else dma_fence_put(fence); -- 2.49.0
