Am 22.04.25 um 15:09 schrieb Liang, Prike:
>>> Stop, wait a second. That shouldn't happen at the first place. Why is the 
>>> eviction
>> fence considered a dependency for page table updates?
>>> When it is added only as bookkeep then we should never consider that here.
>> Looks like something in the sync obj is messed up.
>> It is like this. Here, amdgpu_sync_resv is using
>> DMA_RESV_USAGE_BOOKKEEP.
>>
>>          int amdgpu_sync_resv() {
>>
>>                  ..
>>
>>                  /* TODO: Use DMA_RESV_USAGE_READ here */

That here is the core of the problem.

I've added this TODO item 4 years ago to switch over to DMA_RESV_USAGE_READ 
here when moved all TTM use cases to using 
drm_sched_job_add_resv_dependencies().

That was done, but this TODO here forgotten.

>>                  dma_resv_for_each_fence(&cursor, resv,
>> DMA_RESV_USAGE_BOOKKEEP, f) {
>>                      dma_fence_chain_for_each(f, f) {
>>
>>                  ..
>>
>> }
>> during PT update amdgpu_vm_bo_update() is using sync to moving
>> fences(Eviction fence) before mapping anything. Because of this Eviction 
>> fence will
>> act as dependency.
> Yes, since the amdgpu_sync_resv() uses the bookkeep usage, then all the BOs 
> reservation fences along with the eviction fence will be returned and added 
> to the sync.

Yeah, but that is incorrect.

> And with the attached patch, the eviction fence can be released properly when 
> the kq and uq are enabled.

We need to fix the underlying bug first before we can work on the next step.

Regards,
Christian.

>
> Thanks,
> Prike
>
>> ~arvind
>>
>>> Regards,
>>> Christian.
>>>
>>>> , and then the eviction fence will be added as a dependent fence by
>> propagating with amdgpu_sync_push_to_job(). With removing the eviction fence
>> from the VM sync at amdgpu_sync_resv(), then the eviction fence can be 
>> released
>> properly.
>>>> Thanks,
>>>> Prike
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>> PS: Please stop calling the eviction fence evf.
>>>>>
>>>>>>>>      return 0;
>>>>>>>>
>>>>>>>>   free_err:
>>>>>>>> @@ -237,6 +234,30 @@ void amdgpu_eviction_fence_detach(struct
>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr,
>>>>>>>>      dma_fence_put(stub);
>>>>>>>>   }
>>>>>>>>
>>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
>>>>>>>> +*bo)
>>>>>>> Please name that amdgpu_eviction_fence_remove_all().
>>>>>> Noted.
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> +{
>>>>>>>> +   struct dma_resv *resv = &bo->tbo.base._resv;
>>>>>>>> +   struct dma_fence *fence, *stub;
>>>>>>>> +   struct dma_resv_iter cursor;
>>>>>>>> +
>>>>>>>> +   dma_resv_assert_held(resv);
>>>>>>>> +
>>>>>>>> +   stub = dma_fence_get_stub();
>>>>>>>> +   dma_resv_for_each_fence(&cursor, resv,
>>>>>>> DMA_RESV_USAGE_BOOKKEEP, fence) {
>>>>>>>> +           struct amdgpu_eviction_fence *ev_fence;
>>>>>>>> +
>>>>>>>> +           ev_fence = fence_to_evf(fence);
>>>>>>>> +           if (!ev_fence || !dma_fence_is_signaled(&ev_fence->base))
>>>>>>>> +                   continue;
>>>>>>>> +
>>>>>>>> +           dma_resv_replace_fences(resv, fence->context, stub,
>>>>>>>> +                           DMA_RESV_USAGE_BOOKKEEP);
>>>>>>>> +
>>>>>>>> +   }
>>>>>>>> +
>>>>>>>> +   dma_fence_put(stub);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>   int amdgpu_eviction_fence_init(struct amdgpu_eviction_fence_mgr
>>>>>>>> *evf_mgr)  {
>>>>>>>>      /* This needs to be done one time per open */ diff --git
>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> index fcd867b7147d..da99ac322a2e 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_eviction_fence.h
>>>>>>>> @@ -66,4 +66,5 @@ amdgpu_eviction_fence_signal(struct
>>>>>>>> amdgpu_eviction_fence_mgr *evf_mgr,  int
>>>>>>>> amdgpu_eviction_fence_replace_fence(struct
>>>>>>>> amdgpu_eviction_fence_mgr
>>>>>>> *evf_mgr,
>>>>>>>>                                  struct drm_exec *exec);
>>>>>>>> +void amdgpu_userq_remove_all_eviction_fences(struct amdgpu_bo
>>>>>>>> +*bo);
>>>>>>>>   #endif
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> index 1e73ce30d4d7..f001018a01eb 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>>>> @@ -1392,6 +1392,7 @@ void amdgpu_bo_release_notify(struct
>>>>>>> ttm_buffer_object *bo)
>>>>>>>>      amdgpu_vram_mgr_set_cleared(bo->resource);
>>>>>>>>      dma_resv_add_fence(&bo->base._resv, fence,
>>>>>>> DMA_RESV_USAGE_KERNEL);
>>>>>>>>      dma_fence_put(fence);
>>>>>>>> +   amdgpu_userq_remove_all_eviction_fences(abo);
>>>>>>>>
>>>>>>>>   out:
>>>>>>>>      dma_resv_unlock(&bo->base._resv);

Reply via email to