On Mon, 2025-05-12 at 16:09 +0200, Philipp Stanner wrote:
> On Mon, 2025-05-12 at 11:04 -0300, Maíra Canal wrote:
> > Hi Philipp,
> > 
> > On 12/05/25 08:13, Philipp Stanner wrote:
> > > On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote:
> > > > On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote:
> > > > > On Sat, May 03, 2025 at 05:59:52PM -0300, Maíra Canal wrote:
> > > > > > When the DRM scheduler times out, it's possible that the
> > > > > > GPU
> > > > > > isn't hung;
> > > > > > instead, a job may still be running, and there may be no
> > > > > > valid
> > > > > > reason to
> > > > > > reset the hardware. This can occur in two situations:
> > > > > > 
> > > > > >    1. The GPU exposes some mechanism that ensures the GPU
> > > > > > is
> > > > > > still
> > > > > > making
> > > > > >       progress. By checking this mechanism, we can safely
> > > > > > skip the
> > > > > > reset,
> > > > > >       rearm the timeout, and allow the job to continue
> > > > > > running
> > > > > > until
> > > > > >       completion. This is the case for v3d and Etnaviv.
> > > > > >    2. TDR has fired before the IRQ that signals the fence.
> > > > > > Consequently,
> > > > > >       the job actually finishes, but it triggers a timeout
> > > > > > before
> > > > > > signaling
> > > > > >       the completion fence.
> > > > > > 
> > > > > 
> > > > > We have both of these cases in Xe too. We implement the
> > > > > requeuing
> > > > > in Xe
> > > > > via driver side function - xe_sched_add_pending_job but this
> > > > > looks
> > > > > better and will make use of this.
> > > > > 
> > > > > > These two scenarios are problematic because we remove the
> > > > > > job
> > > > > > from the
> > > > > > `sched->pending_list` before calling `sched->ops-
> > > > > > > timedout_job()`. This
> > > > > > means that when the job finally signals completion (e.g. in
> > > > > > the
> > > > > > IRQ
> > > > > > handler), the scheduler won't call `sched->ops-
> > > > > > >free_job()`.
> > > > > > As a
> > > > > > result,
> > > > > > the job and its resources won't be freed, leading to a
> > > > > > memory
> > > > > > leak.
> > > > > > 
> > > > > > To resolve this issue, we create a new `drm_gpu_sched_stat`
> > > > > > that
> > > > > > allows a
> > > > > > driver to skip the reset. This new status will indicate
> > > > > > that
> > > > > > the
> > > > > > job
> > > > > > should be reinserted into the pending list, and the driver
> > > > > > will
> > > > > > still
> > > > > > signal its completion.
> > > > > > 
> > > > > > Signed-off-by: Maíra Canal <mca...@igalia.com>
> > > > > 
> > > > > Reviewed-by: Matthew Brost <matthew.br...@intel.com>
> > > > > 
> > > > 
> > > > Wait - nevermind I think one issue is below.
> > > > 
> > > > > > ---
> > > > > >   drivers/gpu/drm/scheduler/sched_main.c | 14
> > > > > > ++++++++++++++
> > > > > >   include/drm/gpu_scheduler.h            |  2 ++
> > > > > >   2 files changed, 16 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > index
> > > > > > 829579c41c6b5d8b2abce5ad373c7017469b7680..68ca827d77e32187a
> > > > > > 03
> > > > > > 4309
> > > > > > f881135dbc639a9b4 100644
> > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > @@ -568,6 +568,17 @@ static void
> > > > > > drm_sched_job_timedout(struct
> > > > > > work_struct *work)
> > > > > >                     job->sched->ops->free_job(job);
> > > > > >                     sched->free_guilty = false;
> > > > > >             }
> > > > > > +
> > > > > > +           /*
> > > > > > +            * If the driver indicated that the GPU is
> > > > > > still
> > > > > > running and wants to skip
> > > > > > +            * the reset, reinsert the job back into
> > > > > > the
> > > > > > pending list and realarm the
> > > > > > +            * timeout.
> > > > > > +            */
> > > > > > +           if (status == DRM_GPU_SCHED_STAT_RUNNING)
> > > > > > {
> > > > > > +                   spin_lock(&sched->job_list_lock);
> > > > > > +                   list_add(&job->list, &sched-
> > > > > > > pending_list);
> > > > > > +                   spin_unlock(&sched-
> > > > > > >job_list_lock);
> > > > > > +           }
> > > > 
> > > > I think you need to requeue free_job wq here. It is possible
> > > > the
> > > > free_job wq ran, didn't find a job, goes to sleep, then we add
> > > > a
> > > > signaled job here which will never get freed.
> > > 
> > > I wonder if that could be solved by holding job_list_lock a bit
> > > longer.
> > > free_job_work will try to check the list for the next signaled
> > > job,
> > > but
> > > will wait for the lock.
> > > 
> > > If that works, we could completely rely on the standard mechanism
> > > without requeuing, which would be neat.
> > 
> > I believe it works. However, the tradeoff would be holding the lock
> > for
> > the entire reset of the GPU (in the cases the GPU actually hanged),
> > which looks like a lot of time.
> > 
> > Do you think it's reasonable to do so?
> 
> The scheduler only has three distinct work items, run_job, free_job
> and
> timeout.
> 
> timeout runs only serially, so that's not relevant; and run_job() and
> free_job() should be halted in the timeout handler through
> drm_sched_stop() anyways.
> 
> Moreover, timeouts should be rare events.
> 
> So I'd say yes, the clarity of the code trumps here.

Forget about that, turns out to be an insane idea. Thx to Danilo for
the feedback.

Many drivers take locks in timedout_job(), for example
pvr_queue_timedout_job().

So that's insane deadlock danger. Not good.



So it seems the only option left is Matt's idea?

But that certainly needs an explanatory comment, too.


P.


> 
> Cheers,
> P.
> 
> 
> > 
> > Best Regards,
> > - Maíra
> > 
> > > 
> > > P.
> > > 
> > > > 
> > > > Matt
> > > > 
> > 
> > 
> 

Reply via email to