Never mind, somehow I missed you're already doing that in this patch. Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>
El lun, 28-04-2025 a las 07:45 +0200, Iago Toral escribió: > Hi Maira, > > Looks good to me, but don't we need to do the same in > v3d_csd_job_timedout? > > Iago > > El dom, 27-04-2025 a las 17:28 -0300, Maíra Canal escribió: > > When a CL/CSD job times out, we check if the GPU has made any > > progress > > since the last timeout. If so, instead of resetting the hardware, > > we > > skip > > the reset and let the timer get rearmed. This gives long-running > > jobs > > a > > chance to complete. > > > > However, when `timedout_job()` is called, the job in question is > > removed > > from the pending list, which means it won't be automatically freed > > through > > `free_job()`. Consequently, when we skip the reset and keep the job > > running, the job won't be freed when it finally completes. > > > > This situation leads to a memory leak, as exposed in [1]. > > > > Similarly to commit 704d3d60fec4 ("drm/etnaviv: don't block > > scheduler > > when > > GPU is still active"), this patch ensures the job is put back on > > the > > pending list when extending the timeout. > > > > Cc: sta...@vger.kernel.org # 6.0 > > Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12227 [1] > > Reported-by: Daivik Bhatia <dtgs1...@gmail.com> > > Signed-off-by: Maíra Canal <mca...@igalia.com> > > --- > > drivers/gpu/drm/v3d/v3d_sched.c | 18 +++++++++++------- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c > > b/drivers/gpu/drm/v3d/v3d_sched.c > > index b3be08b0ca91..a98dcf9d75b1 100644 > > --- a/drivers/gpu/drm/v3d/v3d_sched.c > > +++ b/drivers/gpu/drm/v3d/v3d_sched.c > > @@ -734,11 +734,6 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, > > struct drm_sched_job *sched_job) > > return DRM_GPU_SCHED_STAT_NOMINAL; > > } > > > > -/* If the current address or return address have changed, then the > > GPU > > - * has probably made progress and we should delay the reset. This > > - * could fail if the GPU got in an infinite loop in the CL, but > > that > > - * is pretty unlikely outside of an i-g-t testcase. > > - */ > > static enum drm_gpu_sched_stat > > v3d_cl_job_timedout(struct drm_sched_job *sched_job, enum > > v3d_queue > > q, > > u32 *timedout_ctca, u32 *timedout_ctra) > > @@ -748,9 +743,16 @@ v3d_cl_job_timedout(struct drm_sched_job > > *sched_job, enum v3d_queue q, > > u32 ctca = V3D_CORE_READ(0, V3D_CLE_CTNCA(q)); > > u32 ctra = V3D_CORE_READ(0, V3D_CLE_CTNRA(q)); > > > > + /* If the current address or return address have changed, > > then the GPU > > + * has probably made progress and we should delay the > > reset. > > This > > + * could fail if the GPU got in an infinite loop in the > > CL, > > but that > > + * is pretty unlikely outside of an i-g-t testcase. > > + */ > > if (*timedout_ctca != ctca || *timedout_ctra != ctra) { > > *timedout_ctca = ctca; > > *timedout_ctra = ctra; > > + > > + list_add(&sched_job->list, &sched_job->sched- > > > pending_list); > > return DRM_GPU_SCHED_STAT_NOMINAL; > > } > > > > @@ -790,11 +792,13 @@ v3d_csd_job_timedout(struct drm_sched_job > > *sched_job) > > struct v3d_dev *v3d = job->base.v3d; > > u32 batches = V3D_CORE_READ(0, V3D_CSD_CURRENT_CFG4(v3d- > > > ver)); > > > > - /* If we've made progress, skip reset and let the timer > > get > > - * rearmed. > > + /* If we've made progress, skip reset, add the job to the > > pending > > + * list, and let the timer get rearmed. > > */ > > if (job->timedout_batches != batches) { > > job->timedout_batches = batches; > > + > > + list_add(&sched_job->list, &sched_job->sched- > > > pending_list); > > return DRM_GPU_SCHED_STAT_NOMINAL; > > } > > >