Hi Philipp,
On 12/05/25 08:13, Philipp Stanner wrote:
On Tue, 2025-05-06 at 07:32 -0700, Matthew Brost wrote:
On Mon, May 05, 2025 at 07:41:09PM -0700, Matthew Brost wrote:
On Sat, May 03, 2025 at 05:59:52PM -0300, Maíra Canal wrote:
When the DRM scheduler times out, it's possible that the GPU
isn't hung;
instead, a job may still be running, and there may be no valid
reason to
reset the hardware. This can occur in two situations:
1. The GPU exposes some mechanism that ensures the GPU is still
making
progress. By checking this mechanism, we can safely skip the
reset,
rearm the timeout, and allow the job to continue running
until
completion. This is the case for v3d and Etnaviv.
2. TDR has fired before the IRQ that signals the fence.
Consequently,
the job actually finishes, but it triggers a timeout before
signaling
the completion fence.
We have both of these cases in Xe too. We implement the requeuing
in Xe
via driver side function - xe_sched_add_pending_job but this looks
better and will make use of this.
These two scenarios are problematic because we remove the job
from the
`sched->pending_list` before calling `sched->ops-
timedout_job()`. This
means that when the job finally signals completion (e.g. in the
IRQ
handler), the scheduler won't call `sched->ops->free_job()`. As a
result,
the job and its resources won't be freed, leading to a memory
leak.
To resolve this issue, we create a new `drm_gpu_sched_stat` that
allows a
driver to skip the reset. This new status will indicate that the
job
should be reinserted into the pending list, and the driver will
still
signal its completion.
Signed-off-by: Maíra Canal <mca...@igalia.com>
Reviewed-by: Matthew Brost <matthew.br...@intel.com>
Wait - nevermind I think one issue is below.
---
drivers/gpu/drm/scheduler/sched_main.c | 14 ++++++++++++++
include/drm/gpu_scheduler.h | 2 ++
2 files changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index
829579c41c6b5d8b2abce5ad373c7017469b7680..68ca827d77e32187a034309
f881135dbc639a9b4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -568,6 +568,17 @@ static void drm_sched_job_timedout(struct
work_struct *work)
job->sched->ops->free_job(job);
sched->free_guilty = false;
}
+
+ /*
+ * If the driver indicated that the GPU is still
running and wants to skip
+ * the reset, reinsert the job back into the
pending list and realarm the
+ * timeout.
+ */
+ if (status == DRM_GPU_SCHED_STAT_RUNNING) {
+ spin_lock(&sched->job_list_lock);
+ list_add(&job->list, &sched-
pending_list);
+ spin_unlock(&sched->job_list_lock);
+ }
I think you need to requeue free_job wq here. It is possible the
free_job wq ran, didn't find a job, goes to sleep, then we add a
signaled job here which will never get freed.
I wonder if that could be solved by holding job_list_lock a bit longer.
free_job_work will try to check the list for the next signaled job, but
will wait for the lock.
If that works, we could completely rely on the standard mechanism
without requeuing, which would be neat.
I believe it works. However, the tradeoff would be holding the lock for
the entire reset of the GPU (in the cases the GPU actually hanged),
which looks like a lot of time.
Do you think it's reasonable to do so?
Best Regards,
- Maíra
P.
Matt