sched: optimize drm_sched_job_add_dependency

Christian König Mon, 26 May 2025 02:25:26 -0700

On 5/23/25 16:16, Danilo Krummrich wrote:
> On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich wrote:
>> On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König wrote:
>>> It turned out that we can actually massively optimize here.
>>>
>>> The previous code was horrible inefficient since it constantly released
>>> and re-acquired the lock of the xarray and started each iteration from the
>>> base of the array to avoid concurrent modification which in our case
>>> doesn't exist.
>>>
>>> Additional to that the xas_find() and xas_store() functions are explicitly
>>> made in a way so that you can efficiently check entries and if you don't
>>> find a match store a new one at the end or replace existing ones.
>>>
>>> So use xas_for_each()/xa_store() instead of xa_for_each()/xa_alloc().
>>> It's a bit more code, but should be much faster in the end.
>>
>> This commit message does neither explain the motivation of the commit nor 
>> what it
>> does. It describes what instead belongs into the changelog between versions.
> 
> Sorry, this is wrong. I got confused, the commit message is perfectly fine. :)
> 
> The rest still applies though.
> 
>> Speaking of versioning of the patch series, AFAIK there were previous 
>> versions,
>> but this series was sent as a whole new series -- why?
>>
>> Please resend with a proper commit message, version and changelog. Thanks!



Well Philip asked to remove the changelog. I'm happy to bring it back, but 
yeah...

Regards,
Christian.

>>
>>> Signed-off-by: Christian König <christian.koe...@amd.com>
>>> ---
>>>  drivers/gpu/drm/scheduler/sched_main.c | 29 ++++++++++++++++++--------
>>>  1 file changed, 20 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index f7118497e47a..cf200b1b643e 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
>>>  int drm_sched_job_add_dependency(struct drm_sched_job *job,
>>>                              struct dma_fence *fence)
>>>  {
>>> +   XA_STATE(xas, &job->dependencies, 0);
>>>     struct dma_fence *entry;
>>> -   unsigned long index;
>>> -   u32 id = 0;
>>> -   int ret;
>>>  
>>>     if (!fence)
>>>             return 0;
>>> @@ -883,24 +881,37 @@ int drm_sched_job_add_dependency(struct drm_sched_job 
>>> *job,
>>>      * This lets the size of the array of deps scale with the number of
>>>      * engines involved, rather than the number of BOs.
>>>      */
>>> -   xa_for_each(&job->dependencies, index, entry) {
>>> +   xa_lock(&job->dependencies);
>>> +   xas_for_each(&xas, entry, ULONG_MAX) {
>>>             if (entry->context != fence->context)
>>>                     continue;
>>>  
>>>             if (dma_fence_is_later(fence, entry)) {
>>>                     dma_fence_put(entry);
>>> -                   xa_store(&job->dependencies, index, fence, GFP_KERNEL);
>>> +                   xas_store(&xas, fence);
>>>             } else {
>>>                     dma_fence_put(fence);
>>>             }
>>> -           return 0;
>>> +           xa_unlock(&job->dependencies);
>>> +           return xas_error(&xas);
>>>     }
>>>  
>>> -   ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, 
>>> GFP_KERNEL);
>>> -   if (ret != 0)
>>> +retry:
>>> +   entry = xas_store(&xas, fence);
>>> +   xa_unlock(&job->dependencies);
>>> +
>>> +   /* There shouldn't be any concurrent add, so no need to loop again */
>>
>> Concurrency shouldn't matter, xas_nomem() stores the pre-allocated memory in 
>> the
>> XA_STATE not the xarray. Hence, I think we should remove the comment.
>>
>>> +   if (xas_nomem(&xas, GFP_KERNEL)) {
>>> +           xa_lock(&job->dependencies);
>>> +           goto retry;
>>
>> Please don't use a goto here, if we would have failed to allocate memory 
>> here,
>> this would be an endless loop until we succeed eventually. It would be equal 
>> to:
>>
>>      while (!ptr) {
>>              ptr = kmalloc();
>>      }
>>
>> Instead just take the lock and call xas_store() again.
>>
>>> +   }
>>> +
>>> +   if (xas_error(&xas))
>>>             dma_fence_put(fence);
>>> +   else
>>> +           WARN_ON(entry);
>>
>> Please don't call WARN_ON() here, this isn't fatal, we only need to return 
>> the
>> error code.

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

Reply via email to