sched: Free all finished jobs at once

Philipp Stanner Wed, 24 Sep 2025 01:07:39 -0700

On Wed, 2025-09-03 at 11:18 +0100, Tvrtko Ursulin wrote:
> To implement fair scheduling we will need as accurate as possible view
> into per entity GPU time utilisation. Because sched fence execution time
> are only adjusted for accuracy in the free worker we need to process
> completed jobs as soon as possible so the metric is most up to date when
> view from the submission side of things.
> 
> Signed-off-by: Tvrtko Ursulin <[email protected]>


Looks like a good patch to me.

> Cc: Christian König <[email protected]>
> Cc: Danilo Krummrich <[email protected]>
> Cc: Matthew Brost <[email protected]>
> Cc: Philipp Stanner <[email protected]>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 13 ++-----------
>  1 file changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 97bec2f86502..9411676e772a 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -906,7 +906,6 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   * drm_sched_get_finished_job - fetch the next finished job to be destroyed
>   *
>   * @sched: scheduler instance
> - * @have_more: are there more finished jobs on the list
>   *
>   * Informs the caller through @have_more whether there are more finished jobs
>   * besides the returned one.
> @@ -915,7 +914,7 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   * ready for it to be destroyed.
>   */
>  static struct drm_sched_job *
> -drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
> +drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
>  {
>       struct drm_sched_job *job, *next;
>  
> @@ -930,7 +929,6 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler 
> *sched, bool *have_more)
>               /* cancel this job's TO timer */
>               cancel_delayed_work(&sched->work_tdr);
>  
> -             *have_more = false;

That have_more mechanism was introduced by you Tvrtko, in
36caa026b28a4, to avoid a double-lock. AFAICS this patch does not
revert the advantage of that older improvement.

So if you think that's fine there are no objections from my side.

The only thing I am wondering about is whether
drm_sched_free_job_work() might be running unnecessarily often after
this patch. It's kicked off through drm_sched_job_done() at various
places. But I tend to think that investigating optimizations there can
be left for future work, considering that it's running in a workqueue.
But please share your opinion on that if you have one


P.

>               next = list_first_entry_or_null(&sched->pending_list,
>                                               typeof(*next), list);
>               if (next) {
> @@ -940,8 +938,6 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler 
> *sched, bool *have_more)
>                               next->s_fence->scheduled.timestamp =
>                                       
> dma_fence_timestamp(&job->s_fence->finished);
>  
> -                     *have_more = 
> dma_fence_is_signaled(&next->s_fence->finished);
> -
>                       /* start TO timer for next job */
>                       drm_sched_start_timeout(sched);
>               }
> @@ -1000,14 +996,9 @@ static void drm_sched_free_job_work(struct work_struct 
> *w)
>       struct drm_gpu_scheduler *sched =
>               container_of(w, struct drm_gpu_scheduler, work_free_job);
>       struct drm_sched_job *job;
> -     bool have_more;
>  
> -     job = drm_sched_get_finished_job(sched, &have_more);
> -     if (job) {
> +     while ((job = drm_sched_get_finished_job(sched)))
>               sched->ops->free_job(job);
> -             if (have_more)
> -                     drm_sched_run_free_queue(sched);
> -     }
>  
>       drm_sched_run_job_queue(sched);
>  }

Re: [RFC v8 06/12] drm/sched: Free all finished jobs at once

Reply via email to