On Fri, 2025-08-22 at 15:43 +0200, Pierre-Eric Pelloux-Prayer wrote:
> Currently, the scheduler score is incremented when a job is pushed to an
> entity and when an entity is attached to the scheduler.

It's indeed awkward why attaching is treated equivalently to job
submission.

Can you expand the documentation for drm_sched_init_args a bit so that
it gets clearer what the score is supposed to do?

> 
> This leads to some bad scheduling decision where the score value is
> largely made of idle entities.
> 
> For instance, a scenario with 2 schedulers and where 10 entities submit
> a single job, then do nothing, each scheduler will probably end up with
> a score of 5.
> Now, 5 userspace apps exit, so their entities will be dropped. 
> 

"entities will be dropped" == "drm_sched_entity_kill() gets called",
right?

> In
> the worst case, these apps' entities where all attached to the same

s/where/were

or better yet: "could be"

> scheduler and we end up with score=5 (the 5 remaining entities) and
> score=0, despite the 2 schedulers being idle.

Sounds indeed like a (small) problem to me.


> When new entities show up, they will all select the second scheduler
> based on its low score value, instead of alternating between the 2.
> 
> Some amdgpu rings depended on this feature, but the previous commit
> implemented the same thing in amdgpu directly so it can be safely
> removed from drm/sched.

Can we be that sure that other drivers don't depend on it, though? I
suspect it's likely that it's just amdgpu, but…



BTW, since you're cleaning up related stuff currently: I saw that it
seems that the only driver that sets &struct drm_sched_init_args.score
is amdgpu. Would be cool if you can take a look whether that's still
needed.


P.

> 
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-pra...@amd.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 5a550fd76bf0..e6d232a8ec58 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -206,7 +206,6 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
>       if (!list_empty(&entity->list))
>               return;
>  
> -     atomic_inc(rq->sched->score);
>       list_add_tail(&entity->list, &rq->entities);
>  }
>  
> @@ -228,7 +227,6 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>  
>       spin_lock(&rq->lock);
>  
> -     atomic_dec(rq->sched->score);
>       list_del_init(&entity->list);
>  
>       if (rq->current_entity == entity)

Reply via email to