On Fri, 2025-12-05 at 10:19 +0100, Christian König wrote: > On 12/4/25 17:04, Alex Deucher wrote: > > On Wed, Dec 3, 2025 at 4:24 AM Philipp Stanner <[email protected]> wrote: > > > > > > +Cc Alex, Christian, Danilo > > > > > > > > > On Mon, 2025-12-01 at 10:39 -0800, Matthew Brost wrote: > > > > Stop open coding pending job list in drivers. Add pending job list > > > > iterator which safely walks DRM scheduler list asserting DRM scheduler > > > > is stopped. > > > > > > > > v2: > > > > - Fix checkpatch (CI) > > > > v3: > > > > - Drop locked version (Christian) > > > > v4: > > > > - Reorder patch (Niranjana) > > > > > > Same with the changelog. > > > > > > > > > > > Signed-off-by: Matthew Brost <[email protected]> > > > > Reviewed-by: Niranjana Vishwanathapura > > > > <[email protected]> > > > > --- > > > > include/drm/gpu_scheduler.h | 50 +++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 50 insertions(+) > > > > > > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > > > > index 385bf34e76fe..9d228513d06c 100644 > > > > --- a/include/drm/gpu_scheduler.h > > > > +++ b/include/drm/gpu_scheduler.h > > > > @@ -730,4 +730,54 @@ static inline bool > > > > drm_sched_job_is_signaled(struct drm_sched_job *job) > > > > dma_fence_is_signaled(&s_fence->finished); > > > > } > > > > > > > > +/** > > > > + * struct drm_sched_pending_job_iter - DRM scheduler pending job > > > > iterator state > > > > + * @sched: DRM scheduler associated with pending job iterator > > > > + */ > > > > +struct drm_sched_pending_job_iter { > > > > + struct drm_gpu_scheduler *sched; > > > > +}; > > > > + > > > > +/* Drivers should never call this directly */ > > > > +static inline struct drm_sched_pending_job_iter > > > > +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched) > > > > +{ > > > > + struct drm_sched_pending_job_iter iter = { > > > > + .sched = sched, > > > > + }; > > > > + > > > > + WARN_ON(!drm_sched_is_stopped(sched)); > > > > + return iter; > > > > +} > > > > + > > > > +/* Drivers should never call this directly */ > > > > +static inline void > > > > +__drm_sched_pending_job_iter_end(const struct > > > > drm_sched_pending_job_iter iter) > > > > +{ > > > > + WARN_ON(!drm_sched_is_stopped(iter.sched)); > > > > +} > > > > + > > > > +DEFINE_CLASS(drm_sched_pending_job_iter, struct > > > > drm_sched_pending_job_iter, > > > > + __drm_sched_pending_job_iter_end(_T), > > > > + __drm_sched_pending_job_iter_begin(__sched), > > > > + struct drm_gpu_scheduler *__sched); > > > > +static inline void * > > > > +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t > > > > *_T) > > > > +{ return _T; } > > > > +#define class_drm_sched_pending_job_iter_is_conditional false > > > > + > > > > +/** > > > > + * drm_sched_for_each_pending_job() - Iterator for each pending job in > > > > scheduler > > > > + * @__job: Current pending job being iterated over > > > > + * @__sched: DRM scheduler to iterate over pending jobs > > > > + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no > > > > filter > > > > + * > > > > + * Iterator for each pending job in scheduler, filtering on an entity, > > > > and > > > > + * enforcing scheduler is fully stopped > > > > + */ > > > > +#define drm_sched_for_each_pending_job(__job, __sched, __entity) > > > > \ > > > > + scoped_guard(drm_sched_pending_job_iter, (__sched)) > > > > \ > > > > + list_for_each_entry((__job), &(__sched)->pending_list, > > > > list) \ > > > > + for_each_if(!(__entity) || (__job)->entity == > > > > (__entity)) > > > > + > > > > #endif > > > > > > > > > See my comments in the first patch. The docu doesn't mention at all why > > > this new functionality exists and when and why users would be expected > > > to use it. > > > > > > As far as I remember from XDC, both AMD and Intel overwrite a timed out > > > jobs buffer data in the rings on GPU reset. To do so, the driver needs > > > the timedout job (passed through timedout_job() callback) and then > > > needs all the pending non-broken jobs. > > > > > > AFAICS your patch provides a generic iterator over the entire > > > pending_list. How is a driver then supposed to determine which are the > > > non-broken jobs (just asking, but that needs to be documented)? > > > > > > Could it make sense to use a different iterator which only returns jobs > > > of not belonging to the same context as the timedout-one? > > > > > > Those are important questions that need to be addressed before merging > > > that. > > > > > > And if this works canonically (i.e., for basically everyone), it needs > > > to be documented in drm_sched_resubmit_jobs() that this iterator is now > > > the canonical way of handling timeouts. > > > > > > Moreover, btw, just yesterday I added an entry to the DRM todo list > > > which addresses drm_sched_resubmit_jobs(). If we merge this, that entry > > > would have to be removed, too. > > > > > > > > > @AMD: Would the code Matthew provides work for you? Please give your > > > input. This is very important common infrastructure. > > > > I don't think drm_sched_resubmit_jobs() can work for us without major > > rework. For our kernel queues, we have a single queue on which jobs > > for different clients are scheduled. When we reset the queue, we lose > > all jobs on the queue and have to re-emit the non-guilty ones. We do > > this at the ring level, i.e., we save the packets directly from the > > ring and then re-emit the packets for the non-guilty contexts to the > > freshly reset ring. This avoids running run_job() again which would > > issue new fences and race with memory management, etc. > > > > I think the following would be workable: > > 1. driver job_timedout() callback flags the job as bad. resets the bad > > queue, and calls drm_sched_resubmit_jobs() > > 2. drm_sched_resubmit_jobs() walks the pending list and calls > > run_job() for every job > > Calling run_job() multiple times was one of the worst ideas I have ever seen. > > The problem here is that you need a transactional approach to the internal > driver state which is modified by ->run_job(). > > So for example if you have: > ->run_job(A) > ->run_job(B) > ->run_job(C) > > And after a reset you find that you need to re-submit only job B and A & C > are filtered then that means that your driver state needs to get back before > running job A. > > > 2. driver run_job() callback looks to see if we already ran this job > > and uses the original fence rather than allocating a new one > > Nope, the problem is *all* drivers *must* use the original fence. Otherwise > you always run into trouble. > > We should not promote a driver interface which makes it extremely easy to > shoot down the whole system. > > > 3. driver run_job() callback checks to see if the job is guilty or > > from the same context and if so, sets an error on the fences and > > submits only the fence packet to the queue so that any follow up jobs > > will properly synchronize if they need to wait on the fence from the > > bad job. > > 4. driver run_job() callback will submit the full packet stream for > > non-guilty contexts > > > > I guess we could use the iterator and implement that logic in the > > driver directly rather than using drm_sched_resubmit_jobs(). > > Yeah, exactly that's the way to go.
Sorry, I guess my message was confusing. I don't mean anyone to use drm_sched_resubmit_jobs() at all. That function is deprecated, and since there are still users, we can't easily modify it. I just mentioned that function because its docstring should inform about what users should do *instead* of calling that function. Currently, it's just marked as deprecated without providing users with an alternative. Anyways. So can you take a look at Matthew's iterator and see if that will work for you? If we'd end up with an at least mostly-generic solution for resubmits that's in use in both Xe and amdgpu, that would be a huge leap forward. P. > > Christian. > > > > > Alex >
