On 15/01/2026 11:44, Danilo Krummrich wrote:
On Thu Jan 15, 2026 at 9:28 AM CET, Tvrtko Ursulin wrote:

On 14/01/2026 22:13, Danilo Krummrich wrote:
On Fri Dec 19, 2025 at 2:53 PM CET, Tvrtko Ursulin wrote:
diff --git a/drivers/gpu/drm/scheduler/sched_rq.c 
b/drivers/gpu/drm/scheduler/sched_rq.c
index 2d1f579d8352..2fde309d02a6 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c
@@ -16,6 +16,35 @@ drm_sched_entity_compare_before(struct rb_node *a, const 
struct rb_node *b)
        return ktime_before(ea->oldest_job_waiting, eb->oldest_job_waiting);
   }
+static void drm_sched_rq_update_prio(struct drm_sched_rq *rq)
+{
+       enum drm_sched_priority prio = DRM_SCHED_PRIORITY_INVALID;
+       struct rb_node *rb;
+
+       lockdep_assert_held(&rq->lock);
+
+       rb = rb_first_cached(&rq->rb_tree_root);
+       if (rb) {
+               struct drm_sched_entity *entity =
+                       rb_entry(rb, typeof(*entity), rb_tree_node);
+
+               /*
+                * The normal locking order is entity then run-queue so taking
+                * the entity lock here would be a locking inversion for the
+                * case when the current head of the run-queue is different from
+                * the one we already have locked. The unlocked read is fine
+                * though, because if the priority had just changed it is no big
+                * deal for our algorithm, but just a transient reachable only
+                * by drivers with userspace dynamic priority changes API. Equal
+                * in effect to the priority change becoming visible a few
+                * instructions later.
+                */
+               prio = READ_ONCE(entity->priority);

I still think that we should address the root cause of the lock inversion
problem instead.

I previously mentioned that I can take a look at this beginning of this year,
which I can do soon.

In the meantime, can you please explain what's the problem with this specific
case? This function is only ever called from drm_sched_rq_remove_fifo_locked()
and drm_sched_rq_update_fifo_locked(), which already seem to hold both locks.

The entity which is locked is likely not the same as entity at the head
of the run-queue from either call chains.

In one case we have just removed the locked entity from the run-queue,
while in the other tree has been re-balanced so a different entity may
have taken the head position.

Ick! That makes it even worse because this would mean that even if we would be
able to take the entity lock here, this is also prone to lock inversion between
entities.

Not sure what you mean by also an even worse? It is the same entity->rq vs rq->entity lock order inversion. What other inversion is there?

I.e. that is a huge indicator that it is even more necessary to revisit locking
design in general.

Also to note is 99% of cases entity->priority is invariant. Only amdgpu
allows for change of priority post entity creation. So for the rest
locking would not gain anything.

Even for amdgpu the unlocked read is not very relevant, since the only
thing this is used is to determine the run-queue insertion position of a
re-joining entity. So worst thing that could happen, if userspace thread
would race set priority with the job worker picking the next job, is to
*one time* pick a different job.

I get that; it is less that dealing with the priority field by itself is a huge
issue we can't handle, it is more that the above workaround clearly points out a
(locking) design issue, which we should not ignore. It's not only about code
the code working or being correct, it's also about maintainability.

(Even though I'm aware that DRM scheduler maintainability is a bit the DRM
equivalent of the infamous 3x+1 problem. :)

Also to address the root cause of the lock inversion would IMHO be to
re-design the whole scheduler and this specific function here does not
seem should be the trigger for that.

I'm not sure it is that bad, let me take a look in the next days and see what
options we have.

Okay but I am sure you know there are memory barriers, RCU, SPSC queue, completions, worker management, and at least two locks, and everything is interdependent. This series at least removes a bunch of code without making things more complicated and so can be a good base for further rework. If you suggest to hold it until all of the above is resolved it will be a few more years easily.

Regards,

Tvrtko

Reply via email to