On Wed, Jun 04, 2025 at 10:10:21AM -0700, Matthew Brost wrote:
> On Wed, Jun 04, 2025 at 06:53:44PM +0200, Danilo Krummrich wrote:
> > On Wed, Jun 04, 2025 at 09:45:00AM -0700, Matthew Brost wrote:
> > > On Wed, Jun 04, 2025 at 05:07:15PM +0200, Simona Vetter wrote:
> > > > We should definitely document this trick better though, I didn't find 
> > > > any
> > > > place where that was documented.
> > > 
> > > This is a good idea.
> > 
> > I think - and I also mentioned this a few times in the patch series that 
> > added
> > the workqueue support - we should also really document the pitfalls of this.
> > 
> > If the scheduler shares a workqueue with the driver, the driver needs to 
> > take
> > special care when submitting work that it's not possible to prevent run_job 
> > and
> > free_job work from running by doing this.
> > 
> > For instance, if it's a single threaded workqueue and the driver submits 
> > work
> > that allocates with GFP_KERNEL, this is a deadlock condition.
> > 
> > More generally, if the driver submits N work that, for instance allocates 
> > with
> > GFP_KERNEL, it's also a deadlock condition if N == max_active.
> 
> Can we prime lockdep on scheduler init? e.g.
> 
> fs_reclaim_acquire(GFP_KERNEL);
> workqueue_lockdep_acquire();
> workqueue_lockdep_release();
> fs_reclaim_release(GFP_KERNEL);

+1, this should do the trick. Well strictly need GFP_NORECLAIM, so ideally
the one below for dma_fence.

> In addition to documentation, this would prevent workqueues from being
> used that allocate with GFP_KERNEL.
> 
> Maybe we could use dma_fence_sigaling annotations instead of
> fs_reclaim_acquire, but at one point those gave Xe false lockdep
> positives so use fs_reclaim_acquire in similar cases. Maybe that has
> been fixed though.

Yeah the annotation is busted because it doesn't use the right recursive
version. I thought Thomas Hellstrom had a patch once, but it didn't land
yet.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Reply via email to