sched: Consolidate drm_sched_job_timedout

Tvrtko Ursulin Wed, 14 May 2025 01:57:34 -0700


On 12/05/2025 13:53, Philipp Stanner wrote:

On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:

Reduce to one spin_unlock for hopefully a little bit clearer flow in
the
function. It may appear that there is a behavioural change with the
drm_sched_start_timeout_unlocked() now not being called if there were
initially no jobs on the pending list, and then some appeared after
unlock, however if the code would rely on the TDR handler restarting
itself then it would fail to do that if the job arrived on the
pending
list after the check.

Also fix one stale comment while touching the function.


Same here, that's a good candidate for a separate patch / series.

It conflicts with the in progress work from Maíra (fixing memory leakson false timeouts) so I will keep this one on the back-burner until herwork lands.


Regards,

Tvrtko

Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com>
Cc: Christian König <christian.koe...@amd.com>
Cc: Danilo Krummrich <d...@kernel.org>
Cc: Matthew Brost <matthew.br...@intel.com>
Cc: Philipp Stanner <pha...@kernel.org>
---
  drivers/gpu/drm/scheduler/sched_main.c | 37 +++++++++++++-----------
--
  1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index a45b02fd2af3..a26cc11c8ade 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -516,38 +516,37 @@ static void drm_sched_job_begin(struct
drm_sched_job *s_job)

static void drm_sched_job_timedout(struct work_struct *work)

  {
-       struct drm_gpu_scheduler *sched;
+       struct drm_gpu_scheduler *sched =
+               container_of(work, struct drm_gpu_scheduler,
work_tdr.work);
+       enum drm_gpu_sched_stat status;
        struct drm_sched_job *job;
-       enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
-
-       sched = container_of(work, struct drm_gpu_scheduler,
work_tdr.work);

/* Protects against concurrent deletion in

drm_sched_get_finished_job */
        spin_lock(&sched->job_list_lock);
        job = list_first_entry_or_null(&sched->pending_list,
                                       struct drm_sched_job, list);
-
        if (job) {
                /*
                 * Remove the bad job so it cannot be freed by
concurrent
-                * drm_sched_cleanup_jobs. It will be reinserted
back after sched->thread
-                * is parked at which point it's safe.
+                * drm_sched_get_finished_job. It will be reinserted
back after
+                * scheduler worker is stopped at which point it's
safe.
                 */
                list_del_init(&job->list);
-               spin_unlock(&sched->job_list_lock);
+       }
+       spin_unlock(&sched->job_list_lock);

- status = job->sched->ops->timedout_job(job);

+       if (!job)
+               return;

- /*

-                * Guilty job did complete and hence needs to be
manually removed
-                * See drm_sched_stop doc.
-                */
-               if (sched->free_guilty) {
-                       job->sched->ops->free_job(job);
-                       sched->free_guilty = false;
-               }
-       } else {
-               spin_unlock(&sched->job_list_lock);
+       status = job->sched->ops->timedout_job(job);
+
+       /*
+        * Guilty job did complete and hence needs to be manually
removed. See
+        * documentation for drm_sched_stop.
+        */
+       if (sched->free_guilty) {
+               job->sched->ops->free_job(job);
+               sched->free_guilty = false;
        }

if (status != DRM_GPU_SCHED_STAT_ENODEV)

Re: [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout

Reply via email to