On Wed, 2025-04-09 at 16:04 +0200, Philipp Stanner wrote: > +Cc Matthew > > On Wed, 2025-04-09 at 15:55 +0200, Christian König wrote: > > Am 09.04.25 um 12:28 schrieb Philipp Stanner: > > > On Fri, 2025-03-21 at 16:58 +0100, Christian König wrote: > > > > Sometimes drivers need to be able to submit multiple jobs which > > > > depend on > > > > each other to different schedulers at the same time, but using > > > > drm_sched_job_add_dependency() can't fail any more after the > > > > first > > > > job is > > > > initialized. > > > > > > > > This function preallocate memory for dependency slots so that > > > > no > > > > ENOMEM > > > > can come later while adding dependencies. > > > > > > > > v2: rework implementation an documentation > > > > > > > > Signed-off-by: Christian König <christian.koe...@amd.com> > > > > --- > > > > drivers/gpu/drm/scheduler/sched_main.c | 44 > > > > ++++++++++++++++++++++++-- > > > > include/drm/gpu_scheduler.h | 2 ++ > > > > 2 files changed, 43 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > index 4d4219fbe49d..ee3701f346b2 100644 > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > @@ -852,6 +852,39 @@ void drm_sched_job_arm(struct > > > > drm_sched_job > > > > *job) > > > > } > > > > EXPORT_SYMBOL(drm_sched_job_arm); > > > > > > > > +/** > > > > + * drm_sched_job_prealloc_dependency_slots - avoid ENOMEM on > > > > adding > > > > dependencies > > > > + * @job: scheduler job where dependencies will be added > > > > + * @num_deps: number of dependencies to preallocate slots for > > > > + * > > > > + * Sometimes drivers need to be able to submit multiple jobs > > > > which > > > > depend on > > > > + * each other to different schedulers at the same time, but > > > > using > > > > + * drm_sched_job_add_dependency() can't fail any more after > > > > the > > > > first job is > > > > + * initialized. > > > > + * > > > > + * This function preallocate memory for dependency slots so > > > > that > > > > no > > > > ENOMEM can > > > > + * come later while adding dependencies. > > > > + * > > > > + * Return: > > > > + * 0 on success, or an error on failing to expand the array. > > > > + */ > > > > +int drm_sched_job_prealloc_dependency_slots(struct > > > > drm_sched_job > > > > *job, > > > > + unsigned int > > > > num_deps) > > > > +{ > > > > + u32 id = 0; > > > > + int ret; > > > > + > > > > + while (num_deps--) { > > > > + ret = xa_alloc(&job->dependencies, &id, > > > > XA_ZERO_ENTRY, > > > > + xa_limit_32b, GFP_KERNEL); > > > I've had some time to re-read the xarray documentation and I > > > think > > > that > > > this is what xa_reserve() was born for. The Book of > > > Documentation/core- > > > api/xarray.rst sayeth: > > > > > > "Sometimes you need to ensure that a subsequent call to > > > xa_store() > > > will not need to allocate memory. The xa_reserve() function > > > will store a reserved entry at the indicated index. Users of the > > > normal API will see this entry as containing ``NULL``." > > > > > > That's far better, this way we don't have to use that more or > > > less > > > xarray-internal flag.
I've tried to look through the code and think it through… > > > > Yeah I have seen that as well. The reason why I didn't followed > > this > > route was that I wasn't sure if I then need to check for NULL > > entries > > while iterating over the XA. AFAICS, when you use xa_reserve(), xa_load() and xa_for_each() will return NULL for the reserved entries – therefore potentially blowing up the scheduler without NULL checks, when someone uses the new prealloc function without actually filling in the dependencies later. At least the documentation says so: "The xa_reserve() function will store a reserved entry at the indicated index. Users of the normal API will see this entry as containing ``NULL``." So that's definitely not a good idea. BUT the same seems to be the case for xa_alloc(…, XA_ZERO_ENTRY, …)? xa_load() will *definitely* return NULL, since it utilizes xa_zero_to_null(). We have one use, in sched_entity.c. That use should only ever evaluate a valid dependency, so can't realistically speaking be NULL. So the more interesting question is how, xa_for_each(), our main work horse, behaves. It uses xa_find(), which uses xas_find(), which… seems to be OK? xa_find()'s docu says: "* Return: The entry, if found, otherwise %NULL." ??? I agree we should aim for documenting that better. It could also make sense to _consider_ changing xa_for_each() so that it doesn't return reserved entries, but only 'used' entries. P. > > > > Additional to that I couldn't figure out of hand how to determine a > > the next free index slot. > > > > Have you found any example how to use that? I mean the > > documentation > > could certainly be improved a bit. > > Maybe Matthew can help us out here? > > Matthew, what would be the idiomatic way to do this, and can we help > out with improving the Xarray's documentation? > > Thx, > P. > > > > > Regards, > > Christian. > > > > > > > > > > > > + if (ret != 0) > > > > + return ret; > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > +EXPORT_SYMBOL(drm_sched_job_prealloc_dependency_slots); > > > > + > > > > /** > > > > * drm_sched_job_add_dependency - adds the fence as a job > > > > dependency > > > > * @job: scheduler job to add the dependencies to > > > > @@ -878,10 +911,15 @@ int drm_sched_job_add_dependency(struct > > > > drm_sched_job *job, > > > > * engines involved, rather than the number of BOs. > > > > */ > > > > xa_for_each(&job->dependencies, index, entry) { > > > > - if (entry->context != fence->context) > > > > + if (xa_is_zero(entry)) { > > > > + /* > > > > + * Reserved entries must not alloc > > > > memory, > > > > but let's > > > > + * use GFP_ATOMIC just to be on the > > > > defensive side. > > > > + */ > > > > + xa_store(&job->dependencies, index, > > > > fence, > > > > GFP_ATOMIC); > > > And regarding this – it can actually never happen, but you > > > provide > > > ATOMIC just to be sure? > > > > > > I think it would be better if we'd just run into an obvious bug > > > here > > > instead, so like a deadlock with GFP_KERNEL. > > > > > > That's how we do it with pointers that cannot be NULL, too. If > > > the > > > impossible were to happen and it were NULL, we'd crash. > > > > > > P. > > > > > > > + } else if (entry->context != fence->context) { > > > > continue; > > > > - > > > > - if (dma_fence_is_later(fence, entry)) { > > > > + } else if (dma_fence_is_later(fence, entry)) { > > > > dma_fence_put(entry); > > > > xa_store(&job->dependencies, index, > > > > fence, > > > > GFP_KERNEL); > > > > } else { > > > > diff --git a/include/drm/gpu_scheduler.h > > > > b/include/drm/gpu_scheduler.h > > > > index 1a7e377d4cbb..916e820b27ff 100644 > > > > --- a/include/drm/gpu_scheduler.h > > > > +++ b/include/drm/gpu_scheduler.h > > > > @@ -632,6 +632,8 @@ int drm_sched_job_init(struct drm_sched_job > > > > *job, > > > > u32 credits, void *owner); > > > > void drm_sched_job_arm(struct drm_sched_job *job); > > > > void drm_sched_entity_push_job(struct drm_sched_job > > > > *sched_job); > > > > +int drm_sched_job_prealloc_dependency_slots(struct > > > > drm_sched_job > > > > *job, > > > > + unsigned int > > > > num_deps); > > > > int drm_sched_job_add_dependency(struct drm_sched_job *job, > > > > struct dma_fence *fence); > > > > int drm_sched_job_add_syncobj_dependency(struct drm_sched_job > > > > *job, > > >