Re: [PATCH 3/4] rust: Add dma_fence abstractions

Daniel Almeida Fri, 05 Jun 2026 09:00:12 -0700


>> 
>>>> 
>>>> Consider a fence design where signal() consumes self. Now consider this:
>>>> 
>>>> ```
>>>> impl FenceCb for MyCallback {
>>>> fn called(&mut self) {
>>>>  // Can't move the fence out, so we have to put an Option<T> just to be 
>>>> able
>>>>  // to move.
>>>>  if let Some(f) = self.some_fence.take() {
>>>>    f.signal();
>>>>  }
>>>> }
>>>> ```
>>>> 
>>>> This used to be the case when our version of the job queue used the "proxy
>>>> fence" design:
>>>> 
>>>> 
>>>> ```
>>>> // Callback on the hw fence
>>>> impl FenceCb for MyCallback {
>>>> fn called(&mut self) {
>>>>  if let Some(f) = self.submit_fence.take() {
>>>>    f.signal();
>>>>  }  
>>> 
>>> I'm pretty sure lockdep won't like it anyway, because this is nested
>>> locking of the same lock class. For such proxies, we'll need to teach
>>> lockdep about the nesting like has been recently done on
>>> dma_fence_array & co. But I'm digressing.  
>> 
>> Yeah, but this is more about resource transfer in general, not
>> this pattern specifically.
>> 
>> I agree that this has issues, and yes, lockdep complained back
>> then :)
> 
> The thing is, there's so many aspects that could go wrong because of the
> context this callback is called in. Nested locking is one of them,
> the fact we can't sleep is another. And with rust it's even worse,
> because of the implicit drops that will happen when you take ownership
> of resources (taking sleeping locks to remove resources from a dataset
> for instance).
> 
> So, by passing self by value to the ::callback(), you're basically
> telling users "hey, BTW, don't forget to defer the drop to some
> workqueue if you think it's not atomic-safe". And how can users know
> that the thing they're about to drop can be dropped in atomic context?



Can’t we create a token type that signals we’re in atomic context? If
we pass this token type as an argument, perhaps lockdep can check it for us?

Perhaps

enum SomeFancyName {
  Atomic(AtomicToken)
  NotAtomic,
}

signaled(self, atomic: AtomicToken) {
  ..
}

> They basically have to audit the ::drop() of all the resources they
> embed in their type implementing FenceCb. Not only that, but they also
> have to design the thing so the deferral of this ::drop() doesn't
> allocate, because, obviously, allocating in atomic context is
> tricky/fallible. AFAIK, none of this can be spot at compile-time (I
> remember Gary/Danilo mentioning that we could teach the klint about
> some of these rules). This would leave us with runtime checks like
> might_sleep(), but most of the C putters (xxx_put(object)) don't have
> might_sleep() in the path where the decref doesn't lead to a refcnt=0
> situation.
> 
> TLDR; Call this PTSD if you want, but this is the sort of bugs I
> struggled with on the C side, and I can predict that the exact same
> will happen in rust drivers if we expose the FenceCb as it is designed
> here and we don't have a way to check the soundness of the FenceCb
> implementations at compile time.
> 
> The other option (the one I've been advocating for from the start), is
> to not let drivers implement FenceCb (make it private), but instead
> have a bunch of implementations that we know are safe. Here's a list of
> implementations that I think would unblock most of the drivers use
> cases:
> 
> - wakeup a thread
> - complete a completion object
> - schedule a WorkItem
> - schedule a kthread_worker (once we get a proper rust abstraction for
>  that)

This can also work too, I guess.

> 
> It doesn't mean we can't have optimized FenceCb implementations that do
> a lot more in the callback() path instead of deferring to a
> workqueue/thread, but at least those would have to be implemented in
> dma_fence.rs, and the dma_fence.rs maintainers can then carefully audit
> the code as part of the review process, which we know is not really the
> case when changes touch drivers code only.
> 
> FWIW, I think the FenceProxy design you were describing falls into
> this "must be carefully audited" bucket, and should be implemented in
> dma_fence.rs.
> 
>> 
>>> 
>>>> }
>>>> ```
>>>> 
>>>> Although this is not the case anymore, since we phased out this design 
>>>> given
>>>> Christian's recent work. Still, we should ideally not require Option<T> 
>>>> here in
>>>> general just to make resource transfer possible.  
>>> 
>>> I see. OTOH, don't we need to make this inner data movable if we want
>>> to cancel the FenceCb before the fence is signaled anyway? And that's
>>> most certainly a case we have in the teardown path.  
>> 
>> Can you expand a bit on what you mean here?
> 
> Never mind, I was confusing two different iterations of the code here.
> I thought the Option<T> you were mentioning was in
> FenceCbRegistration<T>, with some explicit ::cancel() function that
> would return Option<T> so the user can get its resources back when it
> cancels the registration, and also know whether the callback was called
> or not. But this is all gone now, and all we can do is drop the
> registration, which will automatically drop the inner T.

Re: [PATCH 3/4] rust: Add dma_fence abstractions

Reply via email to