On Wed, Jun 04, 2025 at 05:07:15PM +0200, Simona Vetter wrote: > On Wed, Jun 04, 2025 at 11:41:25AM +0200, Christian König wrote: > > On 6/4/25 10:16, Philipp Stanner wrote: > > > struct drm_sched_init_args provides the possibility of letting the > > > scheduler use user-controlled workqueues, instead of the scheduler > > > creating its own workqueues. It's currently not documented who would > > > want to use that. > > > > > > Not sharing the submit_wq between driver and scheduler has the advantage > > > of no negative intereference between them being able to occur (e.g., > > > MMU notifier callbacks waiting for fences to get signaled). A separate > > > timeout_wq should rarely be necessary, since using the system_wq could, > > > in the worst case, delay a timeout. > > > > > > Discourage the usage of own workqueues in the documentation. > > > > > > Suggested-by: Danilo Krummrich <d...@kernel.org> > > > Signed-off-by: Philipp Stanner <pha...@kernel.org> > > > --- > > > include/drm/gpu_scheduler.h | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > > > index 81dcbfc8c223..11740d745223 100644 > > > --- a/include/drm/gpu_scheduler.h > > > +++ b/include/drm/gpu_scheduler.h > > > @@ -590,14 +590,17 @@ struct drm_gpu_scheduler { > > > * > > > * @ops: backend operations provided by the driver > > > * @submit_wq: workqueue to use for submission. If NULL, an ordered wq is > > > - * allocated and used. > > > + * allocated and used. It is recommended to pass NULL > > > unless there > > > + * is a good reason not to. > > > > Yeah, that's probably a good idea. I'm not sure why xe and nouveau actually > > wanted that.
At one point before workqueues could share a lockdep map we had to pass in a workqueue to not run out of lockdep keys. That restriction is now gone, so Xe passes in NULL. Part of reasoning also was there was an interface to pass in the TDR workqueue, so added one for the submit workqueue. Xe does have an upcoming use for this though. We have a mode where multiple queues share FW resources so interaction with the FW between multiple queues needs to exclusive so we use a single submit workqueue for queues sharing FW resources to avoid using locks in scheduler ops. queues == GPU scheduler / entry in this context. > > The idea of this trick is that you have a fw scheduler which only has one > queue, and a bunch of other things in your driver that also need to be > stuffed into this fw queue (or handled by talking with the fw through > these ringbuffers). > > If you use one single-threaded wq for everything then you don't need > additional locking anymore, and a lot of things become easier. > Yes, this how Xe avoid locks in all scheduler ops. Same in upcoming use case above - multiple queues uses the same single-threaded wq. > We should definitely document this trick better though, I didn't find any > place where that was documented. > This is a good idea. > Maybe a new overview section about "how to concurrency with drm/sched"? > That's also a good place to better highlight the existing documentation > for the 2nd part here. > > > > * @num_rqs: Number of run-queues. This may be at most > > > DRM_SCHED_PRIORITY_COUNT, > > > * as there's usually one run-queue per priority, but may be > > > less. > > > * @credit_limit: the number of credits this scheduler can hold from all > > > jobs > > > * @hang_limit: number of times to allow a job to hang before dropping > > > it. > > > * This mechanism is DEPRECATED. Set it to 0. > > > * @timeout: timeout value in jiffies for submitted jobs. > > > - * @timeout_wq: workqueue to use for timeout work. If NULL, the > > > system_wq is used. > > > + * @timeout_wq: workqueue to use for timeout work. If NULL, the > > > system_wq is > > > + * used. It is recommended to pass NULL unless there is a > > > good > > > + * reason not to. > > > > Well, that's a rather bad idea. > > Yea, I've found using system workqueues in driver code usually creates problems. In Xe, we pass in a single ordered workqueue shared among all queues for the TDR. GT (device) resets are also run on this ordered workqueue too to avoid jobs timing out in parallel. I think most drivers would benefit from this type of design. Matt > > Using a the same single threaded work queue for the timeout of multiple > > schedulers instances has the major advantage of being able to handle > > their occurrence sequentially. > > > > In other words multiple schedulers post their timeout work items on the > > same queue, the first one to run resets the specific HW block in > > question and cancels all timeouts and work items from other schedulers > > which use the same HW block. > > > > It was Sima, I and a few other people who came up with this approach > > because both amdgpu and IIRC panthor implemented that in their own > > specific way, and as usual got it wrong. > > > > If I'm not completely mistaken this approach is now used by amdgpu, > > panthor, xe and imagination and has proven to be rather flexible and > > reliable. It just looks like we never documented that you should do it > > this way. > > It is documented, just not here. See the note in > drm_sched_backend_ops.timedout_job at the very bottom. > > We should definitely have a lot more cross-links between the various > pieces of this puzzle though, that's for sure :-) > > Cheers, Sima > > > > > Regards, > > Christian. > > > > > * @score: score atomic shared with other schedulers. May be NULL. > > > * @name: name (typically the driver's name). Used for debugging > > > * @dev: associated device. Used for debugging > > > > -- > Simona Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch