[Public]

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumer...@gmail.com>
> Sent: Thursday, April 17, 2025 3:40 PM
> To: Liang, Prike <prike.li...@amd.com>; Koenig, Christian
> <christian.koe...@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <alexander.deuc...@amd.com>
> Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached bo release
>
> Am 16.04.25 um 16:47 schrieb Liang, Prike:
> > [Public]
> >
> >> From: Koenig, Christian <christian.koe...@amd.com>
> >> Sent: Wednesday, April 16, 2025 7:07 PM
> >> To: Liang, Prike <prike.li...@amd.com>; amd-gfx@lists.freedesktop.org
> >> Cc: Deucher, Alexander <alexander.deuc...@amd.com>
> >> Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached
> >> bo release
> >>
> >> Am 16.04.25 um 10:50 schrieb Prike Liang:
> >>> Free the evf when the attached bo released. The evf still be
> >>> dependent on and referred to by the attached bo that is scheduled by
> >>> the kernel queue SDMA or gfx after the evf signalled.
> >>>
> >>> Signed-off-by: Prike Liang <prike.li...@amd.com>
> >>> ---
> >>>  .../drm/amd/amdgpu/amdgpu_eviction_fence.c    | 31 ++++++++++++++++--
> -
> >>>  .../drm/amd/amdgpu/amdgpu_eviction_fence.h    |  1 +
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  1 +
> >>>  3 files changed, 28 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> index b34225bbd85d..60be1ac5047d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.c
> >>> @@ -27,6 +27,7 @@
> >>>
> >>>  #define work_to_evf_mgr(w, name) container_of(w, struct
> >>> amdgpu_eviction_fence_mgr, name)  #define evf_mgr_to_fpriv(e)
> >>> container_of(e, struct amdgpu_fpriv, evf_mgr)
> >>> +#define fence_to_evf(f)  container_of(f, struct
> >>> +amdgpu_eviction_fence, base)
> >>>
> >>>  static const char *
> >>>  amdgpu_eviction_fence_get_driver_name(struct dma_fence *fence) @@
> >>> -47,7 +48,7 @@ int  amdgpu_eviction_fence_replace_fence(struct
> >>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>                                 struct drm_exec *exec)  {
> >>> -   struct amdgpu_eviction_fence *old_ef, *new_ef;
> >>> +   struct amdgpu_eviction_fence *new_ef;
> >>>     struct drm_gem_object *obj;
> >>>     unsigned long index;
> >>>     int ret;
> >>> @@ -72,7 +73,6 @@ amdgpu_eviction_fence_replace_fence(struct
> >>> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>
> >>>     /* Update the eviction fence now */
> >>>     spin_lock(&evf_mgr->ev_fence_lock);
> >>> -   old_ef = evf_mgr->ev_fence;
> >>>     evf_mgr->ev_fence = new_ef;
> >>>     spin_unlock(&evf_mgr->ev_fence_lock);
> >>>
> >>> @@ -102,9 +102,6 @@ amdgpu_eviction_fence_replace_fence(struct
> >> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>             }
> >>>     }
> >>>
> >>> -   /* Free old fence */
> >>> -   if (old_ef)
> >>> -           dma_fence_put(&old_ef->base);
> >> That change looks completely incorrect to me, you will now leak the old 
> >> fence.
> > The eviction fence is attached and shared by all the restored validated VM 
> > BOs
> during UQ restore, and at this placement the eviction fence is only detached 
> from
> one of the BOs. Using amdgpu_userq_remove_all_eviction_fences() will walk over
> the resv objects and detach the fence from the resv objs when freeing the BO.
>
> Yeah, but that doesn't justify this change here. See you're completely 
> messing up
> the fence reference count with that.
>
> >
> > But there's a problem: even though dropping all the evf attached to VM BOs 
> > with
> this patch, the evf still referred to by the SDMA and GFX kernel queue jobs 
> at the
> case when enabling the kq and uq at the same time. Thoughts?
>
> Mhm, the eviction fence is always added as bookmark isn't it? As long as the 
> GFX
> and SDMA jobs are not for evicting something then they should only depend on
> fences with usage < bookmark.
>
> Can you dig up when they are added to the dependencies of the job?

When the eviction fence was added to the user queue VM BOs reservation and then 
updated the BO page table, which will add the eviction fence to the VM sync at 
amdgpu_sync_resv(), and then the eviction fence will be added as a dependent 
fence by propagating with amdgpu_sync_push_to_job(). With removing the eviction 
fence from the VM sync at amdgpu_sync_resv(), then the eviction fence can be 
released properly.

Thanks,
Prike
>
> Thanks,
> Christian.
>
> PS: Please stop calling the eviction fence evf.
>
> >
> >>>     return 0;
> >>>
> >>>  free_err:
> >>> @@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct
> >> amdgpu_eviction_fence_mgr *evf_mgr,
> >>>     dma_fence_put(stub);
> >>>  }
> >>>
> >>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo *bo)
> >> Please name that amdgpu_eviction_fence_remove_all().
> > Noted.
> >
> >> Regards,
> >> Christian.
> >>
> >>> +{
> >>> +   struct dma_resv *resv = &bo->tbo.base._resv;
> >>> +   struct dma_fence *fence, *stub;
> >>> +   struct dma_resv_iter cursor;
> >>> +
> >>> +   dma_resv_assert_held(resv);
> >>> +
> >>> +   stub = dma_fence_get_stub();
> >>> +   dma_resv_for_each_fence(&cursor, resv,
> >> DMA_RESV_USAGE_BOOKKEEP, fence) {
> >>> +           struct amdgpu_eviction_fence *ev_fence;
> >>> +
> >>> +           ev_fence = fence_to_evf(fence);
> >>> +           if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))
> >>> +                   continue;
> >>> +
> >>> +           dma_resv_replace_fences(resv, fence->context, stub,
> >>> +                           DMA_RESV_USAGE_BOOKKEEP);
> >>> +
> >>> +   }
> >>> +
> >>> +   dma_fence_put(stub);
> >>> +}
> >>> +
> >>>  int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
> >>> *evf_mgr)  {
> >>>     /* This needs to be done one time per open */ diff --git
> >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> index fcd867b7147d..da99ac322a2e 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
> >>> @@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct
> >>> amdgpu_eviction_fence_mgr *evf_mgr,  int
> >>> amdgpu_eviction_fence_replace_fence(struct amdgpu_eviction_fence_mgr
> >> *evf_mgr,
> >>>                                 struct drm_exec *exec);
> >>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo *bo);
> >>>  #endif
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> index 1e73ce30d4d7..f001018a01eb 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>> @@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct
> >> ttm_buffer_object *bo)
> >>>     amdgpu_vram_mgr_set_cleared(bo->resource);
> >>>     dma_resv_add_fence(&bo->base._resv, fence,
> >> DMA_RESV_USAGE_KERNEL);
> >>>     dma_fence_put(fence);
> >>> +   amdgpu_userq_remove_all_eviction_fences(abo);
> >>>
> >>>  out:
> >>>     dma_resv_unlock(&bo->base._resv);

Reply via email to