On 30/04/2025 11:49, Dan Carpenter wrote:
On Wed, Apr 30, 2025 at 09:28:59AM +0000, Sharma, Shashank wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
Hello Dan,
________________________________
From: Dan Carpenter
Sent: Wednesday, April 30, 2025 10:05 AM
To: Deucher, Alexander
Cc: Koenig, Christian; David Airlie; Simona Vetter; Sharma, Shashank; Khatri,
Sunil; Yadav, Arvind; Paneer Selvam, Arunpravin; amd-...@lists.freedesktop.org;
dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org;
kernel-janit...@vger.kernel.org
Subject: [PATCH] drm/amdgpu/userq: remove unnecessary NULL check
The "ticket" pointer points to in the middle of the &exec struct so it
can't be NULL. Remove the check.
Signed-off-by: Dan Carpenter <dan.carpen...@linaro.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index b0e8098a3988..7505d920fb3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -631,7 +631,7 @@ amdgpu_userq_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
clear = false;
unlock = true;
/* The caller is already holding the reservation lock */
- } else if (ticket && dma_resv_locking_ctx(resv) == ticket) {
+ } else if (dma_resv_locking_ctx(resv) == ticket) {
Its a Nack for me, There are a few situations (particularly during the
first launch of the desktop, and also when eviction fence and new queue
creation are working in parallel) where this ticket can be NULL, we
observed it during the stress validation and hence added this check,
It shouldn't be NULL. It sounds like you are experiencing stack
corruption and this is just a bandaid.
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
566 static int
567 amdgpu_userq_validate_bos(struct amdgpu_userq_mgr *uq_mgr)
568 {
569 struct amdgpu_fpriv *fpriv = uq_mgr_to_fpriv(uq_mgr);
570 struct amdgpu_vm *vm = &fpriv->vm;
571 struct amdgpu_device *adev = uq_mgr->adev;
572 struct amdgpu_bo_va *bo_va;
573 struct ww_acquire_ctx *ticket;
574 struct drm_exec exec;
^^^^^^^^^^^^^^^^^^^^^
The "exec" struct is declared on the stack.
575 struct amdgpu_bo *bo;
576 struct dma_resv *resv;
577 bool clear, unlock;
578 int ret = 0;
579
580 drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0);
581 drm_exec_until_all_locked(&exec) {
582 ret = amdgpu_vm_lock_pd(vm, &exec, 2);
583 drm_exec_retry_on_contention(&exec);
584 if (unlikely(ret)) {
585 DRM_ERROR("Failed to lock PD\n");
586 goto unlock_all;
587 }
588
589 /* Lock the done list */
590 list_for_each_entry(bo_va, &vm->done, base.vm_status) {
591 bo = bo_va->base.bo;
592 if (!bo)
593 continue;
594
595 ret = drm_exec_lock_obj(&exec, &bo->tbo.base);
596 drm_exec_retry_on_contention(&exec);
597 if (unlikely(ret))
598 goto unlock_all;
599 }
600 }
601
602 spin_lock(&vm->status_lock);
603 while (!list_empty(&vm->moved)) {
604 bo_va = list_first_entry(&vm->moved, struct
amdgpu_bo_va,
605 base.vm_status);
606 spin_unlock(&vm->status_lock);
607
608 /* Per VM BOs never need to bo cleared in the page
tables */
609 ret = amdgpu_vm_bo_update(adev, bo_va, false);
610 if (ret)
611 goto unlock_all;
612 spin_lock(&vm->status_lock);
613 }
614
615 ticket = &exec.ticket;
^^^^^^^^^^^^^^^^^^^^^
ticket is only set here. We know that &exec is non-NULL because it's
declared on the stack. ticket is 4 bytes into the middle of a non-NULL
struct. It is impossible for ticket to be NULL here.
Yep, you are right. I just did a code review, and probably we added that
NULL check before we had the right locks in place, and there was a race
between eviction thread and the UQ create thread, causing corruption.
Please feel free to use Acked-by: Shashank Sharma <shashank.sha...@amd.com>
- Shashank
616 while (!list_empty(&vm->invalidated)) {
617 bo_va = list_first_entry(&vm->invalidated, struct
amdgpu_bo_va,
618 base.vm_status);
619 resv = bo_va->base.bo->tbo.base.resv;
620 spin_unlock(&vm->status_lock);
621
622 bo = bo_va->base.bo;
623 ret = amdgpu_userq_validate_vm_bo(NULL, bo);
624 if (ret) {
625 DRM_ERROR("Failed to validate BO\n");
626 goto unlock_all;
627 }
628
629 /* Try to reserve the BO to avoid clearing its ptes */
630 if (!adev->debug_vm && dma_resv_trylock(resv)) {
631 clear = false;
632 unlock = true;
633 /* The caller is already holding the reservation lock
*/
634 } else if (ticket && dma_resv_locking_ctx(resv) ==
ticket) {
I've included the whole rest of the function so that we can see it is not
set a second time.
regards,
dan carpenter
635 clear = false;
636 unlock = false;
637 /* Somebody else is using the BO right now */
638 } else {
639 clear = true;
640 unlock = false;
641 }
642
643 ret = amdgpu_vm_bo_update(adev, bo_va, clear);
644
645 if (unlock)
646 dma_resv_unlock(resv);
647 if (ret)
648 goto unlock_all;
649
650 spin_lock(&vm->status_lock);
651 }
652 spin_unlock(&vm->status_lock);
653
654 ret = amdgpu_eviction_fence_replace_fence(&fpriv->evf_mgr,
&exec);
655 if (ret)
656 DRM_ERROR("Failed to replace eviction fence\n");
657
658 unlock_all:
659 drm_exec_fini(&exec);
660 return ret;
661 }