On 16/05/2025 12:53, Tvrtko Ursulin wrote:

On 16/05/2025 08:28, Philipp Stanner wrote:
On Thu, 2025-05-15 at 17:17 +0100, Tvrtko Ursulin wrote:

On 15/05/2025 16:00, Christian König wrote:
Sometimes drivers need to be able to submit multiple jobs which
depend on
each other to different schedulers at the same time, but using
drm_sched_job_add_dependency() can't fail any more after the first
job is
initialized.

This function preallocate memory for dependency slots so that no
ENOMEM
can come later while adding dependencies.

v2: rework implementation an documentation
v3: rework from scratch, use separate function to add preallocated
deps

I think we agreed to not put change logs into commit messages anymore
:)

They aren't useful for any reader. Who needs the changelog afterwards
can retreive it through the mail thread link that we add.


Signed-off-by: Christian König <christian.koe...@amd.com>
---
   drivers/gpu/drm/scheduler/sched_main.c | 45
++++++++++++++++++++++++++
   include/drm/gpu_scheduler.h            |  4 +++
   2 files changed, 49 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index f7118497e47a..b95e7089aa70 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -858,6 +858,51 @@ void drm_sched_job_arm(struct drm_sched_job
*job)
   }
   EXPORT_SYMBOL(drm_sched_job_arm);
+/**
+ * drm_sched_job_prealloc_dependency_slot - avoid ENOMEM on adding
dependencies
+ * @job: scheduler job where dependencies will be added
+ * @id: id for the allocated slot
+  *
+ * Sometimes drivers need to be able to submit multiple jobs which
depend on
+ * each other to different schedulers at the same time, but using
+ * drm_sched_job_add_dependency() can't fail any more after the
first job is
+ * initialized.
+ *
+ * This function preallocate memory for a dependency slot so that
no ENOMEM can
+ * come later while adding dependencies. The index of the
preallocated slot is
+ * returned in @id.
+ *
+ * Return:
+ * 0 on success, or an error on failing to expand the array.
+ */
+int drm_sched_job_prealloc_dependency_slot(struct drm_sched_job
*job,
+                       u32 *id)
+{
+    return xa_alloc(&job->dependencies, id, NULL,
xa_limit_32b, GFP_KERNEL);
+}
+EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slot);
+
+/**
+ * drm_sched_job_add_prealloc_dep - add dependency to preallocated
slot
+ * @job: scheduler job where dependencies will be added
+ * @id: the preallocated slot index
+ * @fence: the dependency to add
+ *
+ * Consumes @fence and adds it to the preallocated slot
dependency.
+ */
+void drm_sched_job_add_prealloc_dep(struct drm_sched_job *job, u32
id,
+                    struct dma_fence *fence)
+{
+    fence = xa_store(&job->dependencies, id, fence,
GFP_ATOMIC);

Add assert that the passed id exists (was preallocated) and is NULL?

You

Hm?


Also, if someone preallocates and does not consume the slot will that
confuse the iteration in drm_sched_job_dependency()?

drm_sched_job_add_dependency() you mean.

I was actually thinking of drm_sched_job_dependency() because that looked it would skip dependencies upon encountering an unconsumed preallocated slot, but yes, drm_sched_job_add_dependency() could explode even earlier if adding a normal dependency after preallocating a slot.

Yes, it would. All operations simply give you NULL for those slots. So
seems to me you have to check for NULL wherever a preallocated slot
might drop out. That would then be a bug.

It's kind of tricky, all that. It's a pity that Wilcox didn't answer
our questions about the idiomatic way to do it.

Maybe reserving slots with already signaled fences wasn't such a bad
idea after all?

If we go for the NULL approach, it's probably the only sane way to then
check for NULL wherever dependencies are accessed :(

Opinions?

Well if the xarray API returns the NULL consistently the approach from this patch is fine I think.

We just need to add two more checks to the above mentioned functions,

I need to correct myself, drm_sched_job_dependency() wouldn't be able to just skip NULLs since it relies on NULL for "no more dependencies". We would need to track something like job->max_dependency and terminate on job->last_dependency > job->max_dependency or so.

Regards,

Tvrtko

some more unit tests probably to make sure, and that should be fine for now.

On the bikeshedding front I would perhaps suggest:

  - drm_sched_job_preallocate_dependency()
  - drm_sched_job_replace_dependency()

Reads a little bit more aligned with the rest of the API and a bit easier on the eyes, to my eyes at least.

Regards,

Tvrtko


Reply via email to