sched: Allow drivers to skip the reset and keep on running

Philipp Stanner Mon, 12 May 2025 07:09:42 -0700

On Mon, 2025-05-12 at 11:04 -0300, Maíra Canal wrote:
> Hi Philipp,
> 
> On 12/05/25 08:13, Philipp Stanner wrote:
> > On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote:
> > > On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote:
> > > > On Sat, May 03, 2025 at 05:59:52PM -0300, Maíra Canal wrote:
> > > > > When the DRM scheduler times out, it's possible that the GPU
> > > > > isn't hung;
> > > > > instead, a job may still be running, and there may be no
> > > > > valid
> > > > > reason to
> > > > > reset the hardware. This can occur in two situations:
> > > > > 
> > > > >    1. The GPU exposes some mechanism that ensures the GPU is
> > > > > still
> > > > > making
> > > > >       progress. By checking this mechanism, we can safely
> > > > > skip the
> > > > > reset,
> > > > >       rearm the timeout, and allow the job to continue
> > > > > running
> > > > > until
> > > > >       completion. This is the case for v3d and Etnaviv.
> > > > >    2. TDR has fired before the IRQ that signals the fence.
> > > > > Consequently,
> > > > >       the job actually finishes, but it triggers a timeout
> > > > > before
> > > > > signaling
> > > > >       the completion fence.
> > > > > 
> > > > 
> > > > We have both of these cases in Xe too. We implement the
> > > > requeuing
> > > > in Xe
> > > > via driver side function - xe_sched_add_pending_job but this
> > > > looks
> > > > better and will make use of this.
> > > > 
> > > > > These two scenarios are problematic because we remove the job
> > > > > from the
> > > > > `sched->pending_list` before calling `sched->ops-
> > > > > > timedout_job()`. This
> > > > > means that when the job finally signals completion (e.g. in
> > > > > the
> > > > > IRQ
> > > > > handler), the scheduler won't call `sched->ops->free_job()`.
> > > > > As a
> > > > > result,
> > > > > the job and its resources won't be freed, leading to a memory
> > > > > leak.
> > > > > 
> > > > > To resolve this issue, we create a new `drm_gpu_sched_stat`
> > > > > that
> > > > > allows a
> > > > > driver to skip the reset. This new status will indicate that
> > > > > the
> > > > > job
> > > > > should be reinserted into the pending list, and the driver
> > > > > will
> > > > > still
> > > > > signal its completion.
> > > > > 
> > > > > Signed-off-by: Maíra Canal <mca...@igalia.com>
> > > > 
> > > > Reviewed-by: Matthew Brost <matthew.br...@intel.com>
> > > > 
> > > 
> > > Wait - nevermind I think one issue is below.
> > > 
> > > > > ---
> > > > >   drivers/gpu/drm/scheduler/sched_main.c | 14 ++++++++++++++
> > > > >   include/drm/gpu_scheduler.h            |  2 ++
> > > > >   2 files changed, 16 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > index
> > > > > 829579c41c6b5d8b2abce5ad373c7017469b7680..68ca827d77e32187a03
> > > > > 4309
> > > > > f881135dbc639a9b4 100644
> > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > @@ -568,6 +568,17 @@ static void
> > > > > drm_sched_job_timedout(struct
> > > > > work_struct *work)
> > > > >                       job->sched->ops->free_job(job);
> > > > >                       sched->free_guilty = false;
> > > > >               }
> > > > > +
> > > > > +             /*
> > > > > +              * If the driver indicated that the GPU is
> > > > > still
> > > > > running and wants to skip
> > > > > +              * the reset, reinsert the job back into the
> > > > > pending list and realarm the
> > > > > +              * timeout.
> > > > > +              */
> > > > > +             if (status == DRM_GPU_SCHED_STAT_RUNNING) {
> > > > > +                     spin_lock(&sched->job_list_lock);
> > > > > +                     list_add(&job->list, &sched-
> > > > > > pending_list);
> > > > > +                     spin_unlock(&sched->job_list_lock);
> > > > > +             }
> > > 
> > > I think you need to requeue free_job wq here. It is possible the
> > > free_job wq ran, didn't find a job, goes to sleep, then we add a
> > > signaled job here which will never get freed.
> > 
> > I wonder if that could be solved by holding job_list_lock a bit
> > longer.
> > free_job_work will try to check the list for the next signaled job,
> > but
> > will wait for the lock.
> > 
> > If that works, we could completely rely on the standard mechanism
> > without requeuing, which would be neat.
> 
> I believe it works. However, the tradeoff would be holding the lock
> for
> the entire reset of the GPU (in the cases the GPU actually hanged),
> which looks like a lot of time.
> 
> Do you think it's reasonable to do so?


The scheduler only has three distinct work items, run_job, free_job and
timeout.

timeout runs only serially, so that's not relevant; and run_job() and
free_job() should be halted in the timeout handler through
drm_sched_stop() anyways.

Moreover, timeouts should be rare events.

So I'd say yes, the clarity of the code trumps here.

Cheers,
P.


> 
> Best Regards,
> - Maíra
> 
> > 
> > P.
> > 
> > > 
> > > Matt
> > > 
> 
>

Re: [PATCH 1/8] drm/sched: Allow drivers to skip the reset and keep on running

Reply via email to