Re: [PATCH] drm/sched: Discourage usage of separate workqueues

Philipp Stanner Thu, 05 Jun 2025 03:00:08 -0700

On Wed, 2025-06-04 at 17:07 +0200, Simona Vetter wrote:
> On Wed, Jun 04, 2025 at 11:41:25AM +0200, Christian König wrote:
> > On 6/4/25 10:16, Philipp Stanner wrote:
> > > struct drm_sched_init_args provides the possibility of letting
> > > the
> > > scheduler use user-controlled workqueues, instead of the
> > > scheduler
> > > creating its own workqueues. It's currently not documented who
> > > would
> > > want to use that.
> > > 
> > > Not sharing the submit_wq between driver and scheduler has the
> > > advantage
> > > of no negative intereference between them being able to occur
> > > (e.g.,
> > > MMU notifier callbacks waiting for fences to get signaled). A
> > > separate
> > > timeout_wq should rarely be necessary, since using the system_wq
> > > could,
> > > in the worst case, delay a timeout.
> > > 
> > > Discourage the usage of own workqueues in the documentation.
> > > 
> > > Suggested-by: Danilo Krummrich <d...@kernel.org>
> > > Signed-off-by: Philipp Stanner <pha...@kernel.org>
> > > ---
> > >  include/drm/gpu_scheduler.h | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/drm/gpu_scheduler.h
> > > b/include/drm/gpu_scheduler.h
> > > index 81dcbfc8c223..11740d745223 100644
> > > --- a/include/drm/gpu_scheduler.h
> > > +++ b/include/drm/gpu_scheduler.h
> > > @@ -590,14 +590,17 @@ struct drm_gpu_scheduler {
> > >   *
> > >   * @ops: backend operations provided by the driver
> > >   * @submit_wq: workqueue to use for submission. If NULL, an
> > > ordered wq is
> > > - *        allocated and used.
> > > + *        allocated and used. It is recommended to pass NULL
> > > unless there
> > > + *        is a good reason not to.
> > 
> > Yeah, that's probably a good idea. I'm not sure why xe and nouveau
> > actually wanted that.
> 
> The idea of this trick is that you have a fw scheduler which only has
> one
> queue, and a bunch of other things in your driver that also need to
> be
> stuffed into this fw queue (or handled by talking with the fw through
> these ringbuffers).
> 
> If you use one single-threaded wq for everything then you don't need
> additional locking anymore, and a lot of things become easier.
> 
> We should definitely document this trick better though, I didn't find
> any
> place where that was documented.
> 
> Maybe a new overview section about "how to concurrency with
> drm/sched"?
> That's also a good place to better highlight the existing
> documentation
> for the 2nd part here.



Let us first create consensus about when providing workqueues by the
driver makes sense. Correct wrong statements:

   1. Using a single threaded workqueue for timeout_wq is useful for
      guaranteeing that timeouts handlers are performed in the correct
      order. (Christian)
   2. Independently from the point above, using the very same single
      threaded wq for both timeout_wq and submit_wq allows for dropping
      locks from the driver's callbacks.
   3. Driver and scheduler sharing workqueues risks deadlocking, as
      described by Danilo.


I'm wondering about the following now:
   1. Why does the scheduler by default use the system_wq for timeouts?
      Could it make sense for us to always use a single threaded wq?
      After all, each scheduler only ever has 1 work item for timeouts
      in the air.
   2. Isn't point 2 from above purely a trick to avoid locking in the
      callbacks? If it is, I'm not convinced yet that we really want to
      actively recommend that. The callback implementations I've seen
      usually only need their fence context lock, and many drivers
      don't use their own workqueues to begin with.
      But I guess having a sentence somewhere teasering at the
      possibility wouldn't be hurtful.


My current feeling right now is that we should document why the
mechanism exists and discourage using it. It's optional, not really
necessary, risks deadlocking and making that more waterproof would
require more substantial work, like maybe lockdep assertions suggested
by Matthew.

I'm for the more leightweight way.


> 
> > >   * @num_rqs: Number of run-queues. This may be at most
> > > DRM_SCHED_PRIORITY_COUNT,
> > >   *      as there's usually one run-queue per priority, but may
> > > be less.
> > >   * @credit_limit: the number of credits this scheduler can hold
> > > from all jobs
> > >   * @hang_limit: number of times to allow a job to hang before
> > > dropping it.
> > >   * This mechanism is DEPRECATED. Set it to 0.
> > >   * @timeout: timeout value in jiffies for submitted jobs.
> > > - * @timeout_wq: workqueue to use for timeout work. If NULL, the
> > > system_wq is used.
> > > + * @timeout_wq: workqueue to use for timeout work. If NULL, the
> > > system_wq is
> > > + * used. It is recommended to pass NULL unless there is a good
> > > + * reason not to.
> > 
> > Well, that's a rather bad idea.
> > 
> > Using a the same single threaded work queue for the timeout of
> > multiple
> > schedulers instances has the major advantage of being able to
> > handle
> > their occurrence sequentially.
> > 
> > In other words multiple schedulers post their timeout work items on
> > the
> > same queue, the first one to run resets the specific HW block in
> > question and cancels all timeouts and work items from other
> > schedulers
> > which use the same HW block.
> > 
> > It was Sima, I and a few other people who came up with this
> > approach
> > because both amdgpu and IIRC panthor implemented that in their own
> > specific way, and as usual got it wrong.
> > 
> > If I'm not completely mistaken this approach is now used by amdgpu,
> > panthor, xe and imagination and has proven to be rather flexible
> > and
> > reliable. It just looks like we never documented that you should do
> > it
> > this way.
> 
> It is documented, just not here. See the note in
> drm_sched_backend_ops.timedout_job at the very bottom.
> 
> We should definitely have a lot more cross-links between the various
> pieces of this puzzle though, that's for sure :-)

Docu is in far better shape by now than it was last year, but there's
still much to do.

However, we probably also don't want to scather over too many places,
as it becomes harder to keep the docu consistent.


P.

> 
> Cheers, Sima
> 
> > 
> > Regards,
> > Christian.
> > 
> > >   * @score: score atomic shared with other schedulers. May be
> > > NULL.
> > >   * @name: name (typically the driver's name). Used for debugging
> > >   * @dev: associated device. Used for debugging
> > 
>

Re: [PATCH] drm/sched: Discourage usage of separate workqueues

Reply via email to