Am 22.04.25 um 15:09 schrieb Liang, Prike: >>> Stop, wait a second. That shouldn't happen at the first place. Why is the >>> eviction >> fence considered a dependency for page table updates? >>> When it is added only as bookkeep then we should never consider that here. >> Looks like something in the sync obj is messed up. >> It is like this. Here, amdgpu_sync_resv is using >> DMA_RESV_USAGE_BOOKKEEP. >> >> int amdgpu_sync_resv() { >> >> .. >> >> /* TODO: Use DMA_RESV_USAGE_READ here */
That here is the core of the problem. I've added this TODO item 4 years ago to switch over to DMA_RESV_USAGE_READ here when moved all TTM use cases to using drm_sched_job_add_resv_dependencies(). That was done, but this TODO here forgotten. >> dma_resv_for_each_fence(&cursor, resv, >> DMA_RESV_USAGE_BOOKKEEP, f) { >> dma_fence_chain_for_each(f, f) { >> >> .. >> >> } >> during PT update amdgpu_vm_bo_update() is using sync to moving >> fences(Eviction fence) before mapping anything. Because of this Eviction >> fence will >> act as dependency. > Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs > reservation fences along with the eviction fence will be returned and added > to the sync. Yeah, but that is incorrect. > And with the attached patch, the eviction fence can be released properly when > the kq and uq are enabled. We need to fix the underlying bug first before we can work on the next step. Regards, Christian. > > Thanks, > Prike > >> ~arvind >> >>> Regards, >>> Christian. >>> >>>> , and then the eviction fence will be added as a dependent fence by >> propagating with amdgpu_sync_push_to_job(). With removing the eviction fence >> from the VM sync at amdgpu_sync_resv(), then the eviction fence can be >> released >> properly. >>>> Thanks, >>>> Prike >>>>> Thanks, >>>>> Christian. >>>>> >>>>> PS: Please stop calling the eviction fence evf. >>>>> >>>>>>>> return 0; >>>>>>>> >>>>>>>> free_err: >>>>>>>> @@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct >>>>>>> amdgpu_eviction_fence_mgr *evf_mgr, >>>>>>>> dma_fence_put(stub); >>>>>>>> } >>>>>>>> >>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo >>>>>>>> +*bo) >>>>>>> Please name that amdgpu_eviction_fence_remove_all(). >>>>>> Noted. >>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> +{ >>>>>>>> + struct dma_resv *resv = &bo->tbo.base._resv; >>>>>>>> + struct dma_fence *fence, *stub; >>>>>>>> + struct dma_resv_iter cursor; >>>>>>>> + >>>>>>>> + dma_resv_assert_held(resv); >>>>>>>> + >>>>>>>> + stub = dma_fence_get_stub(); >>>>>>>> + dma_resv_for_each_fence(&cursor, resv, >>>>>>> DMA_RESV_USAGE_BOOKKEEP, fence) { >>>>>>>> + struct amdgpu_eviction_fence *ev_fence; >>>>>>>> + >>>>>>>> + ev_fence = fence_to_evf(fence); >>>>>>>> + if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base)) >>>>>>>> + continue; >>>>>>>> + >>>>>>>> + dma_resv_replace_fences(resv, fence->context, stub, >>>>>>>> + DMA_RESV_USAGE_BOOKKEEP); >>>>>>>> + >>>>>>>> + } >>>>>>>> + >>>>>>>> + dma_fence_put(stub); >>>>>>>> +} >>>>>>>> + >>>>>>>> int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr >>>>>>>> *evf_mgr) { >>>>>>>> /* This needs to be done one time per open */ diff --git >>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h >>>>>>>> index fcd867b7147d..da99ac322a2e 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h >>>>>>>> @@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct >>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr, int >>>>>>>> amdgpu_eviction_fence_replace_fence(struct >>>>>>>> amdgpu_eviction_fence_mgr >>>>>>> *evf_mgr, >>>>>>>> struct drm_exec *exec); >>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo >>>>>>>> +*bo); >>>>>>>> #endif >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>>>>>> index 1e73ce30d4d7..f001018a01eb 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>>>>>>> @@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct >>>>>>> ttm_buffer_object *bo) >>>>>>>> amdgpu_vram_mgr_set_cleared(bo->resource); >>>>>>>> dma_resv_add_fence(&bo->base._resv, fence, >>>>>>> DMA_RESV_USAGE_KERNEL); >>>>>>>> dma_fence_put(fence); >>>>>>>> + amdgpu_userq_remove_all_eviction_fences(abo); >>>>>>>> >>>>>>>> out: >>>>>>>> dma_resv_unlock(&bo->base._resv);