On Wed, Jun 04, 2025 at 10:10:21AM -0700, Matthew Brost wrote: > On Wed, Jun 04, 2025 at 06:53:44PM +0200, Danilo Krummrich wrote: > > On Wed, Jun 04, 2025 at 09:45:00AM -0700, Matthew Brost wrote: > > > On Wed, Jun 04, 2025 at 05:07:15PM +0200, Simona Vetter wrote: > > > > We should definitely document this trick better though, I didn't find > > > > any > > > > place where that was documented. > > > > > > This is a good idea. > > > > I think - and I also mentioned this a few times in the patch series that > > added > > the workqueue support - we should also really document the pitfalls of this. > > > > If the scheduler shares a workqueue with the driver, the driver needs to > > take > > special care when submitting work that it's not possible to prevent run_job > > and > > free_job work from running by doing this. > > > > For instance, if it's a single threaded workqueue and the driver submits > > work > > that allocates with GFP_KERNEL, this is a deadlock condition. > > > > More generally, if the driver submits N work that, for instance allocates > > with > > GFP_KERNEL, it's also a deadlock condition if N == max_active. > > Can we prime lockdep on scheduler init? e.g. > > fs_reclaim_acquire(GFP_KERNEL); > workqueue_lockdep_acquire(); > workqueue_lockdep_release(); > fs_reclaim_release(GFP_KERNEL);
+1, this should do the trick. Well strictly need GFP_NORECLAIM, so ideally the one below for dma_fence. > In addition to documentation, this would prevent workqueues from being > used that allocate with GFP_KERNEL. > > Maybe we could use dma_fence_sigaling annotations instead of > fs_reclaim_acquire, but at one point those gave Xe false lockdep > positives so use fs_reclaim_acquire in similar cases. Maybe that has > been fixed though. Yeah the annotation is busted because it doesn't use the right recursive version. I thought Thomas Hellstrom had a patch once, but it didn't land yet. -Sima -- Simona Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch