On Mon, 2025-05-12 at 11:04 -0300, Maíra Canal wrote: > Hi Philipp, > > On 12/05/25 08:13, Philipp Stanner wrote: > > On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote: > > > On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote: > > > > On Sat, May 03, 2025 at 05:59:52PM -0300, Maíra Canal wrote: > > > > > When the DRM scheduler times out, it's possible that the GPU > > > > > isn't hung; > > > > > instead, a job may still be running, and there may be no > > > > > valid > > > > > reason to > > > > > reset the hardware. This can occur in two situations: > > > > > > > > > > 1. The GPU exposes some mechanism that ensures the GPU is > > > > > still > > > > > making > > > > > progress. By checking this mechanism, we can safely > > > > > skip the > > > > > reset, > > > > > rearm the timeout, and allow the job to continue > > > > > running > > > > > until > > > > > completion. This is the case for v3d and Etnaviv. > > > > > 2. TDR has fired before the IRQ that signals the fence. > > > > > Consequently, > > > > > the job actually finishes, but it triggers a timeout > > > > > before > > > > > signaling > > > > > the completion fence. > > > > > > > > > > > > > We have both of these cases in Xe too. We implement the > > > > requeuing > > > > in Xe > > > > via driver side function - xe_sched_add_pending_job but this > > > > looks > > > > better and will make use of this. > > > > > > > > > These two scenarios are problematic because we remove the job > > > > > from the > > > > > `sched->pending_list` before calling `sched->ops- > > > > > > timedout_job()`. This > > > > > means that when the job finally signals completion (e.g. in > > > > > the > > > > > IRQ > > > > > handler), the scheduler won't call `sched->ops->free_job()`. > > > > > As a > > > > > result, > > > > > the job and its resources won't be freed, leading to a memory > > > > > leak. > > > > > > > > > > To resolve this issue, we create a new `drm_gpu_sched_stat` > > > > > that > > > > > allows a > > > > > driver to skip the reset. This new status will indicate that > > > > > the > > > > > job > > > > > should be reinserted into the pending list, and the driver > > > > > will > > > > > still > > > > > signal its completion. > > > > > > > > > > Signed-off-by: Maíra Canal <mca...@igalia.com> > > > > > > > > Reviewed-by: Matthew Brost <matthew.br...@intel.com> > > > > > > > > > > Wait - nevermind I think one issue is below. > > > > > > > > --- > > > > > drivers/gpu/drm/scheduler/sched_main.c | 14 ++++++++++++++ > > > > > include/drm/gpu_scheduler.h | 2 ++ > > > > > 2 files changed, 16 insertions(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > > index > > > > > 829579c41c6b5d8b2abce5ad373c7017469b7680..68ca827d77e32187a03 > > > > > 4309 > > > > > f881135dbc639a9b4 100644 > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > > @@ -568,6 +568,17 @@ static void > > > > > drm_sched_job_timedout(struct > > > > > work_struct *work) > > > > > job->sched->ops->free_job(job); > > > > > sched->free_guilty = false; > > > > > } > > > > > + > > > > > + /* > > > > > + * If the driver indicated that the GPU is > > > > > still > > > > > running and wants to skip > > > > > + * the reset, reinsert the job back into the > > > > > pending list and realarm the > > > > > + * timeout. > > > > > + */ > > > > > + if (status == DRM_GPU_SCHED_STAT_RUNNING) { > > > > > + spin_lock(&sched->job_list_lock); > > > > > + list_add(&job->list, &sched- > > > > > > pending_list); > > > > > + spin_unlock(&sched->job_list_lock); > > > > > + } > > > > > > I think you need to requeue free_job wq here. It is possible the > > > free_job wq ran, didn't find a job, goes to sleep, then we add a > > > signaled job here which will never get freed. > > > > I wonder if that could be solved by holding job_list_lock a bit > > longer. > > free_job_work will try to check the list for the next signaled job, > > but > > will wait for the lock. > > > > If that works, we could completely rely on the standard mechanism > > without requeuing, which would be neat. > > I believe it works. However, the tradeoff would be holding the lock > for > the entire reset of the GPU (in the cases the GPU actually hanged), > which looks like a lot of time. > > Do you think it's reasonable to do so?
The scheduler only has three distinct work items, run_job, free_job and timeout. timeout runs only serially, so that's not relevant; and run_job() and free_job() should be halted in the timeout handler through drm_sched_stop() anyways. Moreover, timeouts should be rare events. So I'd say yes, the clarity of the code trumps here. Cheers, P. > > Best Regards, > - Maíra > > > > > P. > > > > > > > > Matt > > > > >