On Tue, 2025-04-22 at 13:13 +0200, Danilo Krummrich wrote: > On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote: > > Question I raised is if there are other drivers which manage to > > clean up > > everything correctly (like the mock scheduler does), but trigger > > that > > warning. Maybe there are not and maybe mock scheduler is the only > > false > > positive.
For clarification: I messed up the comment from the cover letter. What I did was run the *old* unit tests (v5 IIRC) from Tvrtko that still had the memory leaks. Those then trigger the warning print, as is expected, since they don't provide fence_context_kill(). The current unit tests are fine memory-leak-wise. IOW, both with Nouveau and the unit tests, everything behaves as expected, without issues. Sorry for the confusion. P. > > So far the scheduler simply does not give any guideline on how to > address the > problem, hence every driver simply does something (or nothing, > effectively > ignoring the problem). This is what we want to fix. > > The mock scheduler keeps it's own list of pending jobs and on tear > down stops > the scheduler's workqueues, traverses it's own list and eventually > frees the > pending jobs without updating the scheduler's internal pending list. > > So yes, it does avoid memory leaks, but it also leaves the schedulers > internal > structures with an invalid state, i.e. the pending list of the > scheduler has > pointers to already freed memory. > > What if the drm_sched_fini() starts touching the pending list? Then > you'd end up > with UAF bugs with this implementation. We cannot invalidate the > schedulers > internal structures and yet call scheduler functions - e.g. > drm_sched_fini() - > subsequently. > > Hence, the current implementation of the mock scheduler is > fundamentally flawed. > And so would be *every* driver that still has entries within the > scheduler's > pending list. > > This is not a false positive, it already caught a real bug -- in the > mock > scheduler. > > - Danilo