Hi all, Recently I mentioned to Danilo about some fence lifetime issues so here is a rough series, more than anything intended to start the discussion.
Most of the problem statement can be found in the first patch but to briefly summarise - because sched fence can outlive the scheduler, we can trivially engineer an use after free with xe and possibly other drivers. All that is needed is to convert a syncobj into a sync file behind drivers back, and I don't see what the driver can do about it. IGT that exploits the problem: https://patchwork.freedesktop.org/patch/642709/?series=146211&rev=2 Different flavour of the problem space is if we had a close(drm_fd) in that test before the sleep. In that case we can even unload xe.ko and gpu-sched.ko for even more fun. Last two patches in the series close that gap. But first two patches are just shrinking the race window. They are not proper fixes. This is what I want to discuss since I understand reference counting all the involved objects has been rejected in the past. And since the problem probably expands to all dma fences it certainly isn't easy. To be clear once more - lets not focus on how this does not fix it fully - I am primarily trying to start the conversation. Cc: Christian König <christian.koe...@amd.com> Cc: Danilo Krummrich <d...@kernel.org> Cc: Lucas De Marchi <lucas.demar...@intel.com> Cc: Matthew Brost <matthew.br...@intel.com> Cc: Philipp Stanner <pha...@kernel.org> Cc: Rodrigo Vivi <rodrigo.v...@intel.com> Tvrtko Ursulin (4): sync_file: Weakly paper over one use-after-free resulting race dma-fence: Slightly safer dma_fence_set_deadline drm/sched: Keep module reference while there are active fences drm/xe: Keep module reference while there are active fences drivers/dma-buf/dma-fence.c | 2 +- drivers/dma-buf/sync_file.c | 29 ++++++++++++++++++++----- drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++++-- drivers/gpu/drm/xe/xe_hw_fence.c | 13 ++++++++++- 4 files changed, 47 insertions(+), 9 deletions(-) -- 2.48.0