On Thu Jan 15, 2026 at 9:28 AM CET, Tvrtko Ursulin wrote:
>
> On 14/01/2026 22:13, Danilo Krummrich wrote:
>> On Fri Dec 19, 2025 at 2:53 PM CET, Tvrtko Ursulin wrote:
>>> diff --git a/drivers/gpu/drm/scheduler/sched_rq.c 
>>> b/drivers/gpu/drm/scheduler/sched_rq.c
>>> index 2d1f579d8352..2fde309d02a6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_rq.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_rq.c
>>> @@ -16,6 +16,35 @@ drm_sched_entity_compare_before(struct rb_node *a, const 
>>> struct rb_node *b)
>>>     return ktime_before(ea->oldest_job_waiting, eb->oldest_job_waiting);
>>>   }
>>>   
>>> +static void drm_sched_rq_update_prio(struct drm_sched_rq *rq)
>>> +{
>>> +   enum drm_sched_priority prio = DRM_SCHED_PRIORITY_INVALID;
>>> +   struct rb_node *rb;
>>> +
>>> +   lockdep_assert_held(&rq->lock);
>>> +
>>> +   rb = rb_first_cached(&rq->rb_tree_root);
>>> +   if (rb) {
>>> +           struct drm_sched_entity *entity =
>>> +                   rb_entry(rb, typeof(*entity), rb_tree_node);
>>> +
>>> +           /*
>>> +            * The normal locking order is entity then run-queue so taking
>>> +            * the entity lock here would be a locking inversion for the
>>> +            * case when the current head of the run-queue is different from
>>> +            * the one we already have locked. The unlocked read is fine
>>> +            * though, because if the priority had just changed it is no big
>>> +            * deal for our algorithm, but just a transient reachable only
>>> +            * by drivers with userspace dynamic priority changes API. Equal
>>> +            * in effect to the priority change becoming visible a few
>>> +            * instructions later.
>>> +            */
>>> +           prio = READ_ONCE(entity->priority);
>> 
>> I still think that we should address the root cause of the lock inversion
>> problem instead.
>> 
>> I previously mentioned that I can take a look at this beginning of this year,
>> which I can do soon.
>> 
>> In the meantime, can you please explain what's the problem with this specific
>> case? This function is only ever called from 
>> drm_sched_rq_remove_fifo_locked()
>> and drm_sched_rq_update_fifo_locked(), which already seem to hold both locks.
>
> The entity which is locked is likely not the same as entity at the head 
> of the run-queue from either call chains.
>
> In one case we have just removed the locked entity from the run-queue, 
> while in the other tree has been re-balanced so a different entity may 
> have taken the head position.

Ick! That makes it even worse because this would mean that even if we would be
able to take the entity lock here, this is also prone to lock inversion between
entities.

I.e. that is a huge indicator that it is even more necessary to revisit locking
design in general.

> Also to note is 99% of cases entity->priority is invariant. Only amdgpu 
> allows for change of priority post entity creation. So for the rest 
> locking would not gain anything.
>
> Even for amdgpu the unlocked read is not very relevant, since the only 
> thing this is used is to determine the run-queue insertion position of a 
> re-joining entity. So worst thing that could happen, if userspace thread 
> would race set priority with the job worker picking the next job, is to 
> *one time* pick a different job.

I get that; it is less that dealing with the priority field by itself is a huge
issue we can't handle, it is more that the above workaround clearly points out a
(locking) design issue, which we should not ignore. It's not only about code
the code working or being correct, it's also about maintainability.

(Even though I'm aware that DRM scheduler maintainability is a bit the DRM
equivalent of the infamous 3x+1 problem. :)

> Also to address the root cause of the lock inversion would IMHO be to 
> re-design the whole scheduler and this specific function here does not 
> seem should be the trigger for that.

I'm not sure it is that bad, let me take a look in the next days and see what
options we have.

Reply via email to