On Tue, 2026-01-13 at 16:16 +0100, Christian König wrote:
> Hi everyone,
> 
> dma_fences have ever lived under the tyranny dictated by the module
> lifetime of their issuer, leading to crashes should anybody still holding
> a reference to a dma_fence when the module of the issuer was unloaded.
> 
> The basic problem is that when buffer are shared between drivers
> dma_fence objects can leak into external drivers and stay there even
> after they are signaled. The dma_resv object for example only lazy releases
> dma_fences.
> 
> So what happens is that when the module who originally created the dma_fence
> unloads the dma_fence_ops function table becomes unavailable as well and so
> any attempt to release the fence crashes the system.
> 
> Previously various approaches have been discussed, including changing the
> locking semantics of the dma_fence callbacks (by me) as well as using the
> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
> from their actual users, but none of them are actually solving all problems.
> 
> Tvrtko did some really nice prerequisite work by protecting the returned
> strings of the dma_fence_ops by RCU. This way dma_fence creators where
> able to just wait for an RCU grace period after fence signaling before
> they could be save to free those data structures.
> 
> Now this patch set here goes a step further and protects the whole
> dma_fence_ops structure by RCU, so that after the fence signals the
> pointer to the dma_fence_ops is set to NULL when there is no wait nor
> release callback given. All functionality which use the dma_fence_ops
> reference are put inside an RCU critical section, except for the
> deprecated issuer specific wait and of course the optional release
> callback.
> 
> Additional to the RCU changes the lock protecting the dma_fence state
> previously had to be allocated external. This set here now changes the
> functionality to make that external lock optional and allows dma_fences
> to use an inline lock and be self contained.
> 
> v4:
> 
> Rebases the whole set on upstream changes, especially the cleanup
> from Philip in patch "drm/amdgpu: independence for the amdkfd_fence!".
> 
> Adding two patches which brings the DMA-fence self tests up to date.
> The first selftest changes removes the mock_wait and so actually starts
> testing the default behavior instead of some hacky implementation in the
> test. This one got upstreamed independent of this set.
> The second drops the mock_fence as well and tests the new RCU and inline
> spinlock functionality.
> 
> v5:
> 
> Rebase on top of drm-misc-next instead of drm-tip, leave out all driver
> changes for now since those should go through the driver specific paths
> anyway.
> 
> Address a few more review comments, especially some rebase mess and
> typos. And finally fix one more bug found by AMDs CI system.
> 
> Especially the first patch still needs a Reviewed-by, apart from that I
> think I've addressed all review comments and problems.
> 
> Please review and comment,
> Christian.


You forgot Danilo, who is also a drm_sched maintainer.
+Cc.

P.

> 

Reply via email to