On Tue, 17 Mar 2026 15:48:25 +0100
"Christian König" <[email protected]> wrote:

> In case of a refcounting bug dma_fence_release() can be called before the
> fence was even signaled.
> 
> Previously the dma_fence framework then force signaled the fence to make
> sure to unblock waiters, but that can potentially lead to random memory
> corruption when the DMA operation continues. So be more defensive here and
> pick the lesser evil.
> 
> Instead of force signaling the fence set an error code on the fence,
> re-initialize the refcount to something large and taint the kernel.
> 
> This will leak memory and eventually can cause a deadlock when the fence
> is never signaled, but at least we won't run into an use after free or
> random memory corruption.
> 
> Signed-off-by: Christian König <[email protected]>
> ---
>  drivers/dma-buf/dma-fence.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1826ba73094c..8bf07685a053 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -593,14 +593,24 @@ void dma_fence_release(struct kref *kref)
>               /*
>                * Failed to signal before release, likely a refcounting issue.
>                *
> -              * This should never happen, but if it does make sure that we
> -              * don't leave chains dangling. We set the error flag first
> -              * so that the callbacks know this signal is due to an error.
> +              * This should never happen, but if try to be defensive and take
> +              * the lesser evil. Initialize the refcount to something large,
> +              * but not so large that it can overflow.
> +              *
> +              * That will leak memory and could deadlock if the fence never
> +              * signals, but at least it doesn't cause an use after free or
> +              * random memory corruption.
> +              *
> +              * Also taint the kernel to note that it is rather unreliable to
> +              * continue.
>                */
>               dma_fence_lock_irqsave(fence, flags);
>               fence->error = -EDEADLK;
> -             dma_fence_signal_locked(fence);
> +             refcount_set(&fence->refcount.refcount, INT_MAX);

I'm not convinced this is useful. If we leak the object, no one should
have a ref to release anyway. This does raise a question though. The
case we're trying to protect against is fence_callback being registered
to this fence and waiting for an event to signal another proxy fence.
How can the refcnt drop to zero in that case? Isn't the proxy supposed
to own a ref on the fence. Before we go further, I'd like to understand
what we're trying to do.

The original discussion that led you to write this patch was about
detecting when a fence emitter/producer would leave unsignalled fences
behind, and the problem we have is when such unsignalled fences have
observers waiting for a "signalled" event. If the refcnt drops to zero
and the fence is released, we're already passed that point,
unfortunately. It can be that:

- the fence was never exposed -> this is fine
- the fence was exposed but never observed -> this is broken, because if
  it had been observed it would have led to a deadlock
- the fence was exposed, observed for some time, but the observer got
  bored, stopped waiting and:
  * decided to go and execute its stuff anyway -> use-before-ready
    situation
  * gave up -> kinda okay, but we should still consider the fence
    emitter broken
- the fence observer registered a callback but didn't take a ref on the
  object -> this is potential UAF on the dma_fence, which can also lead
  to a VRAM/system-mem UAF if the emitter drops the dma_fence without
  signalling, because of the auto-signal you're getting rid of in this
  patch.  But the latter is just a side effect of the dma_fence UAF,
  which I'm not convinced we should try to protect against.

>               dma_fence_unlock_irqrestore(fence, flags);
> +             rcu_read_unlock();
> +             add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> +             return;
>       }
>  
>       ops = rcu_dereference(fence->ops);

Reply via email to