On 22/04/2025 13:00, Philipp Stanner wrote:
On Tue, 2025-04-22 at 13:13 +0200, Danilo Krummrich wrote:
On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote:
Question I raised is if there are other drivers which manage to
clean up
everything correctly (like the mock scheduler does), but trigger
that
warning. Maybe there are not and maybe mock scheduler is the only
false
positive.
For clarification:
I messed up the comment from the cover letter.
What I did was run the *old* unit tests (v5 IIRC) from Tvrtko that
still had the memory leaks. Those then trigger the warning print, as is
expected, since they don't provide fence_context_kill().
The current unit tests are fine memory-leak-wise.
IOW, both with Nouveau and the unit tests, everything behaves as
expected, without issues.
Hmm how? Pending list can be non-empty so it should be possible to hit
that warn.
Regards,
Tvrtko
So far the scheduler simply does not give any guideline on how to
address the
problem, hence every driver simply does something (or nothing,
effectively
ignoring the problem). This is what we want to fix.
The mock scheduler keeps it's own list of pending jobs and on tear
down stops
the scheduler's workqueues, traverses it's own list and eventually
frees the
pending jobs without updating the scheduler's internal pending list.
So yes, it does avoid memory leaks, but it also leaves the schedulers
internal
structures with an invalid state, i.e. the pending list of the
scheduler has
pointers to already freed memory.
What if the drm_sched_fini() starts touching the pending list? Then
you'd end up
with UAF bugs with this implementation. We cannot invalidate the
schedulers
internal structures and yet call scheduler functions - e.g.
drm_sched_fini() -
subsequently.
Hence, the current implementation of the mock scheduler is
fundamentally flawed.
And so would be *every* driver that still has entries within the
scheduler's
pending list.
This is not a false positive, it already caught a real bug -- in the
mock
scheduler.
- Danilo