[Public]

From: Koenig, Christian <christian.koe...@amd.com>
Sent: Tuesday, April 22, 2025 9:26 PM
To: Liang, Prike <prike.li...@amd.com>; Yadav, Arvind <arvind.ya...@amd.com>
Cc: Deucher, Alexander <alexander.deuc...@amd.com>; 
amd-gfx@lists.freedesktop.org; Christian König 
<ckoenig.leichtzumer...@gmail.com>
Subject: Re: [PATCH 4/4] drm/amdgpu: free the evf when the attached bo release

Am 22.04.25 um 15:09 schrieb Liang, Prike:


Stop, wait a second. That shouldn't happen at the first place. Why is the 
eviction

fence considered a dependency for page table updates?



When it is added only as bookkeep then we should never consider that here.

Looks like something in the sync obj is messed up.

It is like this. Here, amdgpu_sync_resv is using

DMA_RESV_USAGE_BOOKKEEP.



         int amdgpu_sync_resv() {



                 ..



                 /* TODO: Use DMA_RESV_USAGE_READ here */

That here is the core of the problem.

I've added this TODO item 4 years ago to switch over to DMA_RESV_USAGE_READ 
here when moved all TTM use cases to using 
drm_sched_job_add_resv_dependencies().

That was done, but this TODO here forgotten.

So here the user space submission sync only requires syncing the fences with 
less than DMA_RESV_USAGE_READ usage in the amdgpu_sync_resv(). If so, then the 
eviction fence will not be added to the sync with kernel queue submission. I 
can submit a patch for this change.

Thanks,
Prike



                 dma_resv_for_each_fence(&cursor, resv,

DMA_RESV_USAGE_BOOKKEEP, f) {

                     dma_fence_chain_for_each(f, f) {



                 ..



}

during PT update amdgpu_vm_bo_update() is using sync to moving

fences(Eviction fence) before mapping anything. Because of this Eviction fence 
will

act as dependency.



Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs 
reservation fences along with the eviction fence will be returned and added to 
the sync.

Yeah, but that is incorrect.



And with the attached patch, the eviction fence can be released properly when 
the kq and uq are enabled.

We need to fix the underlying bug first before we can work on the next step.

Regards,
Christian.







Thanks,

Prike



~arvind



Regards,

Christian.



, and then the eviction fence will be added as a dependent fence by

propagating with amdgpu_sync_push_to_job(). With removing the eviction fence

from the VM sync at amdgpu_sync_resv(), then the eviction fence can be released

properly.



Thanks,

Prike

Thanks,

Christian.



PS: Please stop calling the eviction fence evf.



     return 0;



  free_err:

@@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct

amdgpu_eviction_fence_mgr *evf_mgr,

     dma_fence_put(stub);

  }



+void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo

+*bo)

Please name that amdgpu_eviction_fence_remove_all().

Noted.



Regards,

Christian.



+{

+   struct dma_resv *resv = &bo->tbo.base._resv;

+   struct dma_fence *fence, *stub;

+   struct dma_resv_iter cursor;

+

+   dma_resv_assert_held(resv);

+

+   stub = dma_fence_get_stub();

+   dma_resv_for_each_fence(&cursor, resv,

DMA_RESV_USAGE_BOOKKEEP, fence) {

+           struct amdgpu_eviction_fence *ev_fence;

+

+           ev_fence = fence_to_evf(fence);

+           if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))

+                   continue;

+

+           dma_resv_replace_fences(resv, fence->context, stub,

+                           DMA_RESV_USAGE_BOOKKEEP);

+

+   }

+

+   dma_fence_put(stub);

+}

+

  int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr

*evf_mgr)  {

     /* This needs to be done one time per open */ diff --git

a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

index fcd867b7147d..da99ac322a2e 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h

@@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct

amdgpu_eviction_fence_mgr *evf_mgr,  int

amdgpu_eviction_fence_replace_fence(struct

amdgpu_eviction_fence_mgr

*evf_mgr,

                                 struct drm_exec *exec);

+void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo

+*bo);

  #endif

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

index 1e73ce30d4d7..f001018a01eb 100644

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

@@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct

ttm_buffer_object *bo)

     amdgpu_vram_mgr_set_cleared(bo->resource);

     dma_resv_add_fence(&bo->base._resv, fence,

DMA_RESV_USAGE_KERNEL);

     dma_fence_put(fence);

+   amdgpu_userq_remove_all_eviction_fences(abo);



  out:

     dma_resv_unlock(&bo->base._resv);

Reply via email to