On 24/04/2025 10:55, Philipp Stanner wrote:
The waitqueue that ensures that drm_sched_fini() blocks until the
pending_list has become empty could theoretically cause that function to
block for a very long time. That, ultimately, could block userspace
procesess and prevent them from being killable through SIGKILL.
When a driver calls drm_sched_fini(), it is safe to assume that all
still pending jobs are not needed anymore anyways. Thus, they can be
cancelled and thereby it can be ensured that drm_sched_fini() will
return relatively quickly.
Implement a new helper to stop all work items / submission except for
the drm_sched_backend_ops.run_job().
Implement a driver callback, kill_fence_context(), that instructs the
driver to kill the fence context associated with this scheduler, thereby
causing all pending hardware fences to be signalled.
Call those new routines in drm_sched_fini() and ensure backwards
compatibility if the new callback is not implemented.
Suggested-by: Danilo Krummrich <d...@redhat.com>
Signed-off-by: Philipp Stanner <pha...@kernel.org>
---
drivers/gpu/drm/scheduler/sched_main.c | 47 +++++++++++++++++---------
include/drm/gpu_scheduler.h | 11 ++++++
2 files changed, 42 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index ea82e69a72a8..c2ad6c70bfb6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1390,31 +1390,46 @@ drm_sched_no_jobs_pending(struct drm_gpu_scheduler
*sched)
return empty;
}
+/**
+ * drm_sched_cancel_jobs_and_wait - trigger freeing of all pending jobs
+ * @sched: scheduler instance
+ *
+ * Must only be called if &struct drm_sched_backend_ops.kill_fence_context is
+ * implemented.
+ *
+ * Instructs the driver to kill the fence context associated with this
scheduler,
+ * thereby signalling all pending fences. This, in turn, will trigger
+ * &struct drm_sched_backend_ops.free_job to be called for all pending jobs.
+ * The function then blocks until all pending jobs have been freed.
+ */
+static inline void
+drm_sched_cancel_jobs_and_wait(struct drm_gpu_scheduler *sched)
+{
+ sched->ops->kill_fence_context(sched);
+ wait_event(sched->pending_list_waitque,
drm_sched_no_jobs_pending(sched));
+}
+
/**
* drm_sched_fini - Destroy a gpu scheduler
*
* @sched: scheduler instance
*
- * Tears down and cleans up the scheduler.
- *
- * Note that this function blocks until all the fences returned by
- * &struct drm_sched_backend_ops.run_job have been signalled.
+ * Tears down and cleans up the scheduler. Might leak memory if
+ * &struct drm_sched_backend_ops.kill_fence_context is not implemented.
*/
void drm_sched_fini(struct drm_gpu_scheduler *sched)
{
struct drm_sched_entity *s_entity;
int i;
- /*
- * Jobs that have neither been scheduled or which have timed out are
- * gone by now, but jobs that have been submitted through
- * backend_ops.run_job() and have not yet terminated are still pending.
- *
- * Wait for the pending_list to become empty to avoid leaking those
jobs.
- */
- drm_sched_submission_and_timeout_stop(sched);
- wait_event(sched->pending_list_waitque,
drm_sched_no_jobs_pending(sched));
- drm_sched_free_stop(sched);
+ if (sched->ops->kill_fence_context) {
+ drm_sched_submission_and_timeout_stop(sched);
+ drm_sched_cancel_jobs_and_wait(sched);
+ drm_sched_free_stop(sched);
+ } else {
+ /* We're in "legacy free-mode" and ignore potential mem leaks */
+ drm_sched_wqueue_stop(sched);
+ }
for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
struct drm_sched_rq *rq = sched->sched_rq[i];
@@ -1502,7 +1517,7 @@ bool drm_sched_wqueue_ready(struct drm_gpu_scheduler
*sched)
EXPORT_SYMBOL(drm_sched_wqueue_ready);
/**
- * drm_sched_wqueue_stop - stop scheduler submission
+ * drm_sched_wqueue_stop - stop scheduler submission and freeing
Looks like the kerneldoc corrections (below too) belong to the previous
patch. Irrelevant if you decide to squash them though.
* @sched: scheduler instance
*
* Stops the scheduler from pulling new jobs from entities. It also stops
@@ -1518,7 +1533,7 @@ void drm_sched_wqueue_stop(struct drm_gpu_scheduler
*sched)
EXPORT_SYMBOL(drm_sched_wqueue_stop);
/**
- * drm_sched_wqueue_start - start scheduler submission
+ * drm_sched_wqueue_start - start scheduler submission and freeing
* @sched: scheduler instance
*
* Restarts the scheduler after drm_sched_wqueue_stop() has stopped it.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index d0b1f416b4d9..8630b4a26f10 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -509,6 +509,17 @@ struct drm_sched_backend_ops {
* and it's time to clean it up.
*/
void (*free_job)(struct drm_sched_job *sched_job);
+
+ /**
+ * @kill_fence_context: kill the fence context belonging to this
scheduler
Which fence context would that be? ;)
Also, "fence context" would be a new terminology in gpu_scheduler.h API
level. You could call it ->sched_fini() or similar to signify at which
point in the API it gets called and then the fact it takes sched as
parameter would be natural.
We also probably want some commentary on the topic of indefinite (or
very long at least) blocking a thread exit / SIGINT/TERM/KILL time.
Cover letter touches upon that problem but I don't see you address it.
Is the idea to let drivers shoot themselves in the foot or what?
Regards,
Tvrtko
+ *
+ * Needed to cleanly tear the scheduler down in drm_sched_fini(). This
+ * callback will cause all hardware fences to be signalled by the
driver,
+ * which, ultimately, ensures that all jobs get freed before teardown.
+ *
+ * This callback is optional, but it is highly recommended to implement
it.
+ */
+ void (*kill_fence_context)(struct drm_gpu_scheduler *sched);
};
/**