On Tue, 2026-01-13 at 16:16 +0100, Christian König wrote: > Hi everyone, > > dma_fences have ever lived under the tyranny dictated by the module > lifetime of their issuer, leading to crashes should anybody still holding > a reference to a dma_fence when the module of the issuer was unloaded. > > The basic problem is that when buffer are shared between drivers > dma_fence objects can leak into external drivers and stay there even > after they are signaled. The dma_resv object for example only lazy releases > dma_fences. > > So what happens is that when the module who originally created the dma_fence > unloads the dma_fence_ops function table becomes unavailable as well and so > any attempt to release the fence crashes the system. > > Previously various approaches have been discussed, including changing the > locking semantics of the dma_fence callbacks (by me) as well as using the > drm scheduler as intermediate layer (by Sima) to disconnect dma_fences > from their actual users, but none of them are actually solving all problems. > > Tvrtko did some really nice prerequisite work by protecting the returned > strings of the dma_fence_ops by RCU. This way dma_fence creators where > able to just wait for an RCU grace period after fence signaling before > they could be save to free those data structures. > > Now this patch set here goes a step further and protects the whole > dma_fence_ops structure by RCU, so that after the fence signals the > pointer to the dma_fence_ops is set to NULL when there is no wait nor > release callback given. All functionality which use the dma_fence_ops > reference are put inside an RCU critical section, except for the > deprecated issuer specific wait and of course the optional release > callback. > > Additional to the RCU changes the lock protecting the dma_fence state > previously had to be allocated external. This set here now changes the > functionality to make that external lock optional and allows dma_fences > to use an inline lock and be self contained. > > v4: > > Rebases the whole set on upstream changes, especially the cleanup > from Philip in patch "drm/amdgpu: independence for the amdkfd_fence!". > > Adding two patches which brings the DMA-fence self tests up to date. > The first selftest changes removes the mock_wait and so actually starts > testing the default behavior instead of some hacky implementation in the > test. This one got upstreamed independent of this set. > The second drops the mock_fence as well and tests the new RCU and inline > spinlock functionality. > > v5: > > Rebase on top of drm-misc-next instead of drm-tip, leave out all driver > changes for now since those should go through the driver specific paths > anyway. > > Address a few more review comments, especially some rebase mess and > typos. And finally fix one more bug found by AMDs CI system. > > Especially the first patch still needs a Reviewed-by, apart from that I > think I've addressed all review comments and problems. > > Please review and comment, > Christian.
You forgot Danilo, who is also a drm_sched maintainer. +Cc. P. >
