Re: [PATCH] drm/sched: Avoid double re-lock on the job free path

Tvrtko Ursulin Fri, 11 Jul 2025 08:11:28 -0700


On 11/07/2025 14:04, Philipp Stanner wrote:

Late to the party; had overlooked that the discussion with Matt is
resolved. Some comments below

On Tue, 2025-07-08 at 13:20 +0100, Tvrtko Ursulin wrote:

Currently the job free work item will lock sched->job_list_lock first time
to see if there are any jobs, free a single job, and then lock again to
decide whether to re-queue itself if there are more finished jobs.

Since drm_sched_get_finished_job() already looks at the second job in the
queue we can simply add the signaled check and have it return the presence
of more jobs to free to the caller. That way the work item does not


optional nit:
s/to free/to be freed

Reads a bit more cleanly.


Done.

have
to lock the list again and repeat the signaled check.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com>
Cc: Christian König <christian.koe...@amd.com>
Cc: Danilo Krummrich <d...@kernel.org>
Cc: Matthew Brost <matthew.br...@intel.com>
Cc: Philipp Stanner <pha...@kernel.org>
---
  drivers/gpu/drm/scheduler/sched_main.c | 37 ++++++++++----------------
  1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 1f077782ec12..1bce0b66f89c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -366,22 +366,6 @@ static void __drm_sched_run_free_queue(struct 
drm_gpu_scheduler *sched)
                queue_work(sched->submit_wq, &sched->work_free_job);
  }

-/**

- * drm_sched_run_free_queue - enqueue free-job work if ready
- * @sched: scheduler instance
- */
-static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)


The function name is now free. See my comment at the bottom.

-{
-       struct drm_sched_job *job;
-
-       spin_lock(&sched->job_list_lock);
-       job = list_first_entry_or_null(&sched->pending_list,
-                                      struct drm_sched_job, list);
-       if (job && dma_fence_is_signaled(&job->s_fence->finished))
-               __drm_sched_run_free_queue(sched);
-       spin_unlock(&sched->job_list_lock);
-}
-
  /**
   * drm_sched_job_done - complete a job
   * @s_job: pointer to the job which is done
@@ -1102,12 +1086,13 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
   * drm_sched_get_finished_job - fetch the next finished job to be destroyed
   *
   * @sched: scheduler instance
+ * @have_more: are there more finished jobs on the list


I'd like a very brief sentence below here like:

"Informs the caller through @have_more whether there are more finished
jobs besides the returned one."

Reason being that it's relatively rare in the kernel that status is not
transmitted through a return value, so we want that to be very obvious.


Done.

   *
   * Returns the next finished job from the pending list (if there is one)
   * ready for it to be destroyed.
   */
  static struct drm_sched_job *
-drm_sched_get_finished_job(struct drm_gpu_scheduler *sched)
+drm_sched_get_finished_job(struct drm_gpu_scheduler *sched, bool *have_more)
  {
        struct drm_sched_job *job, *next;

@@ -1115,22 +1100,25 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler *sched) job = list_first_entry_or_null(&sched->pending_list,

                                       struct drm_sched_job, list);
-
        if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
                /* remove job from pending_list */
                list_del_init(&job->list);

/* cancel this job's TO timer */

                cancel_delayed_work(&sched->work_tdr);
-               /* make the scheduled timestamp more accurate */
+
+               *have_more = false;


Don't we want that bool initialized to false at the very beginning of
the function? That way it can never be forgotten if the code gets
reworked.

I opted to leave this as is, given how kerneldoc is clear this is onlyvalid if a job was returned.

                next = list_first_entry_or_null(&sched->pending_list,
                                                typeof(*next), list);
-
                if (next) {
+                       /* make the scheduled timestamp more accurate */
                        if (test_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT,
                                     &next->s_fence->scheduled.flags))
                                next->s_fence->scheduled.timestamp =
                                        
dma_fence_timestamp(&job->s_fence->finished);
+
+                       *have_more = 
dma_fence_is_signaled(&next->s_fence->finished);
+
                        /* start TO timer for next job */
                        drm_sched_start_timeout(sched);
                }
@@ -1189,12 +1177,15 @@ static void drm_sched_free_job_work(struct work_struct 
*w)
        struct drm_gpu_scheduler *sched =
                container_of(w, struct drm_gpu_scheduler, work_free_job);
        struct drm_sched_job *job;
+       bool have_more;

- job = drm_sched_get_finished_job(sched);

-       if (job)
+       job = drm_sched_get_finished_job(sched, &have_more);
+       if (job) {
                sched->ops->free_job(job);
+               if (have_more)
+                       __drm_sched_run_free_queue(sched);


Now that drm_sched_run_free_queue() is dead, it's an excellent
opportunity to give its name to __drm_sched_run_free_queue() \o/

Cleaner namespace, and reads better with the below
drm_sched_run_job_queue().


Well spotted - done.

Besides, cool patch!


Thanks!

Regards,

Tvrtko

+       }

- drm_sched_run_free_queue(sched);

        drm_sched_run_job_queue(sched);
  }

Re: [PATCH] drm/sched: Avoid double re-lock on the job free path

Reply via email to