sched: Break submission patterns with some randomness

Tvrtko Ursulin Mon, 28 Jul 2025 04:14:35 -0700


On 28/07/2025 10:28, Pierre-Eric Pelloux-Prayer wrote:

Le 24/07/2025 à 16:19, Tvrtko Ursulin a écrit :

GPUs generally don't implement preemption and DRM scheduler definitely
does not support it at the front end scheduling level. This means
execution quanta can be quite long and is controlled by userspace,
consequence of which is picking the "wrong" entity to run can have a

larger negative effect than it would have with a virtual runtime basedCPU

scheduler.

Another important consideration is that rendering clients often have
shallow submission queues, meaning they will be entering and exiting the
scheduler's runnable queue often.

Relevant scenario here is what happens when an entity re-joins the

runnable queue with other entities already present. One cornerstone ofthe

virtual runtime algorithm is to let it re-join at the head and depend on
the virtual runtime accounting to sort out the order after an execution
quanta or two.

However, as explained above, this may not work fully reliably in the GPU
world. Entity could always get to overtake the existing entities, or not,
depending on the submission order and rbtree equal key insertion
behaviour.

We can break this latching by adding some randomness for this specific
corner case.

If an entity is re-joining the runnable queue, was head of the queue the
last time it got picked, and there is an already queued different entity

of an equal scheduling priority, we can break the tie by randomlychoosing

the execution order between the two.

For randomness we implement a simple driver global boolean which selects

whether new entity will be first or not. Because the boolean is globaland

shared between all the run queues and entities, its actual effect can be

loosely called random. Under the assumption it will not always be thesame

entity which is re-joining the queue under these circumstances.

Another way to look at this is that it is adding a little bit of limited
random round-robin behaviour to the fair scheduling algorithm.

Net effect is a significant improvemnt to the scheduling unit tests which
check the scheduling quality for the interactive client running in
parallel with GPU hogs.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com>
Cc: Christian König <christian.koe...@amd.com>
Cc: Danilo Krummrich <d...@kernel.org>
Cc: Matthew Brost <matthew.br...@intel.com>
Cc: Philipp Stanner <pha...@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-pra...@amd.com>
---
  drivers/gpu/drm/scheduler/sched_rq.c | 10 ++++++++++
  1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_rq.c b/drivers/gpu/drm/scheduler/sched_rq.c

index d16ee3ee3653..087a6bdbb824 100644
--- a/drivers/gpu/drm/scheduler/sched_rq.c
+++ b/drivers/gpu/drm/scheduler/sched_rq.c

@@ -147,6 +147,16 @@ drm_sched_entity_restore_vruntime(structdrm_sched_entity *entity,

               * Higher priority can go first.
               */
              vruntime = -us_to_ktime(rq_prio - prio);
+        } else {
+            static const int shuffle[2] = { -100, 100 };
+            static bool r = 0;
+
+            /*
+             * For equal priority apply some randomness to break
+             * latching caused by submission patterns.
+             */
+            vruntime = shuffle[r];
+            r ^= 1;


I don't understand why this is needed at all?

I suppose this is related to how drm_sched_entity_save_vruntime saves arelative vruntime (= entity rejoins with a 0 runtime would be impossibleotherwise) but I don't understand this either.

Two things (and a bit more) to explain here for the record. And asagreed off-line I need to add some more code comments for this are inthe next respin.

First the saving of "vruntime - min_runtime" when entity exits therun-queue.

That is a core CFS concept AFAIU which enables the relative position ofthe entity to be restored once it re-enters the rq.

It only applies on the scenario when the picked entity was not the headof the queue, due the actual head being not runnable due a dependency.

If the picked entity then leaves the queue and re-joins, this relativevruntime is used to put it back where it was relative to the unreadyentity (which may have became ready by now and so it needs to be pickednext and not overtaken so easily.)

It has to be the relative vruntime that is preserved, ie. entity whichre-enters cannot simply keep its previous vruntime, since by then thatcould lag significantly behind the vruntime of other active entities,which in turn would mean the re-joining entity could be head of thequeue for a long time.

Second part is the special case from the quoted patch and that onlyapplies to entities which are re-joining the queue after having beenpicked from the head _and_ there is another entity in the rq.

By the nature of the CFS algorithm the re-joining entity continues withthe vruntime assigned from the current rq min_vruntime. Which puts twoentities with the same vruntime at the head of the queue and the actualpicking order influenced by the submit order (FIFO) and rbtree sortorder (did not check). But in any case it is not desirable for all thedescription of GPU scheduling weaknesses from the commit text (this patch).


For this special case there are three sub-paths:

1. Re-joining entity is higher scheduling prio -> we pull its vruntimea tiny bit ahead of the min_vruntime so it runs first.

2. Lower re-joining prio -> the opposite of the above - we explicitlyprevent it overtaking the higher priority head.


 3. Equal prio -> apply some randomness as to which one runs first.

Idea being avoidance of any "latching" of the execution order based onsubmission patterns. Which kind of applies a little bit ofround/random-robin for this very specific case of equal priority entityre-joining at the top of the queue.


Regards,

Tvrtko

Re: [RFC v7 10/12] drm/sched: Break submission patterns with some randomness

Reply via email to