> On 5 Jun 2026, at 04:56, Philipp Stanner <[email protected]> wrote:
>
> On Thu, 2026-06-04 at 10:15 +0200, Boris Brezillon wrote:
>> On Wed, 3 Jun 2026 21:43:05 -0300
>> Daniel Almeida <[email protected]> wrote:
>>
>>>> On 3 Jun 2026, at 14:14, Boris Brezillon <[email protected]>
>>>> wrote:
>>>>
>>>> On Wed, 3 Jun 2026 13:41:02 -0300
>>>> Daniel Almeida <[email protected]> wrote:
>>>>
>>>>>> + /// Called when the fence is signaled.
>>>>>> + ///
>>>>>> + /// This is called from the fence signaling path, which may be in
>>>>>> interrupt
>>>>>> + /// context or with locks held, which is why `self` is only
>>>>>> borrowed, so that
>>>>>> + /// it cannot drop. Implementations must not sleep or perform
>>>>>> + /// long-running operations.
>>>>>> + ///
>>>>>> + /// An implementation likely wants to inform itself (e.g., through
>>>>>> a work item)
>>>>>> + /// within this callback that the associated
>>>>>> [`FenceCbRegistration`] can now be
>>>>>> + /// dropped.
>>>>>> + fn called(&mut self);
>>>>>
>>>>> This is a central point. We ideally would want this to consume self,
>>>>> because we
>>>>> may want to move things out of the callback.
>>>>
>>>> This one comes from me. The rationale being that ::called() is called
>>>> from an atomic context, and the resources attached to the callback data
>>>> might require acquiring other sleeping locks to be released, and
>>>> sometimes you don't even notice immediately because said resources are
>>>> refcounted, and the lock is only acquired when you happen to be the
>>>> last owner. Yes, those can be caught at runtime if the C side is
>>>> properly annotated with might_sleep(), but that's not always the case.
>>>>
>>>> If we defer the drop of the data only when the FenceCb is
>>>> dropped/recycled, we're at least not constrained by this "runs in
>>>> atomic context" thing.
>>>>
>>>
>>> This design does not solve it, because one can quite trivially get around
>>> this
>>> restriction using Option<T> as I said. If your point is “don’t run any
>>> drop() here”,
>>> then &mut self doesn’t do it.
>>
>> My bad, I thought you were talking about some Option<T> in
>> FenceCbRegistration<T> (there was one at some point, but it's gone now),
>> but you're talking about having an Option<X> inside the T. Yes, there's
>> indeed nothing preventing a drop on X in that path, and it's just as
>> bad as passing the fence back as value to the callback in that case.
>
> Then maybe we should just pass it by value and require implementation
> of an unsafe trait on `T`, whose safety-requirements demand that this
> must be save to drop from atomic context.
>
>>
>>>
>>>>>
>>>>> Consider a fence design where signal() consumes self. Now consider this:
>>>>>
>>>>> ```
>>>>> impl FenceCb for MyCallback {
>>>>> fn called(&mut self) {
>>>>> // Can't move the fence out, so we have to put an Option<T> just to be
>>>>> able
>>>>> // to move.
>>>>> if let Some(f) = self.some_fence.take() {
>>>>> f.signal();
>>>>> }
>>>>> }
>>>>> ```
>>>>>
>>>>> This used to be the case when our version of the job queue used the "proxy
>>>>> fence" design:
>>>>>
>>>>>
>>>>> ```
>>>>> // Callback on the hw fence
>>>>> impl FenceCb for MyCallback {
>>>>> fn called(&mut self) {
>>>>> if let Some(f) = self.submit_fence.take() {
>>>>> f.signal();
>>>>> }
>>>>
>>>> I'm pretty sure lockdep won't like it anyway, because this is nested
>>>> locking of the same lock class. For such proxies, we'll need to teach
>>>> lockdep about the nesting like has been recently done on
>>>> dma_fence_array & co. But I'm digressing.
>>>
>>> Yeah, but this is more about resource transfer in general, not
>>> this pattern specifically.
>>>
>>> I agree that this has issues, and yes, lockdep complained back
>>> then :)
>>
>> The thing is, there's so many aspects that could go wrong because of the
>> context this callback is called in. Nested locking is one of them,
>> the fact we can't sleep is another. And with rust it's even worse,
>> because of the implicit drops that will happen when you take ownership
>> of resources (taking sleeping locks to remove resources from a dataset
>> for instance).
>
> Doesn't that have to be a problem for much of Rust infrastructure? How
> do other parties solve that?
>
>>
>> So, by passing self by value to the ::callback(), you're basically
>> telling users "hey, BTW, don't forget to defer the drop to some
>> workqueue if you think it's not atomic-safe". And how can users know
>> that the thing they're about to drop can be dropped in atomic context?
>> They basically have to audit the ::drop() of all the resources they
>> embed in their type implementing FenceCb. Not only that, but they also
>> have to design the thing so the deferral of this ::drop() doesn't
>> allocate, because, obviously, allocating in atomic context is
>> tricky/fallible. AFAIK, none of this can be spot at compile-time (I
>> remember Gary/Danilo mentioning that we could teach the klint about
>> some of these rules). This would leave us with runtime checks like
>> might_sleep(), but most of the C putters (xxx_put(object)) don't have
>> might_sleep() in the path where the decref doesn't lead to a refcnt=0
>> situation.
>>
>> TLDR; Call this PTSD if you want, but this is the sort of bugs I
>> struggled with on the C side, and I can predict that the exact same
>> will happen in rust drivers if we expose the FenceCb as it is designed
>> here and we don't have a way to check the soundness of the FenceCb
>> implementations at compile time.
>
> My guess would be that the existence of unsafe-traits is the admission
> of Rust that this just cannot be guaranteed by design.
>
> If a driver cannot know whether this or that is safe to drop, then it
> would have to defer it's dropping. Or would there be cases where this
> also doesn't work?
Although I totally understand where Boris is coming from here, and I agree with
him, the reality is that the current &mut self design doesn’t solve this. An
unsafe trait could work as a pinky-promise by drivers, which is half-way there.
What we ideally would like to have is a bound though, something like:
T: !Drop
If I recall correctly there were people working to get support for that on
Rust? I think there are two things here: !Trait, which is not supported except
for !Sized IIRC, and having an auto trait that represents types that implement
Drop, similar to Send and Sync.
>
>>
>> The other option (the one I've been advocating for from the start), is
>> to not let drivers implement FenceCb (make it private), but instead
>> have a bunch of implementations that we know are safe. Here's a list of
>> implementations that I think would unblock most of the drivers use
>> cases:
>>
>> - wakeup a thread
>> - complete a completion object
>> - schedule a WorkItem
>> - schedule a kthread_worker (once we get a proper rust abstraction for
>> that)
>>
>> It doesn't mean we can't have optimized FenceCb implementations that do
>> a lot more in the callback() path instead of deferring to a
>> workqueue/thread, but at least those would have to be implemented in
>> dma_fence.rs, and the dma_fence.rs maintainers can then carefully audit
>> the code as part of the review process, which we know is not really the
>> case when changes touch drivers code only.
>
> Pragmatically speaking, if the common cases are trivial, then the
> drivers will get them right, because those critical primitives are
> already atomic-safe.
No, you can still fumble trivial code, specially when it has to be cargo-culted
from somewhere else.
I am not saying that having things in dma_fence . rs is the solution, although
it is definitely one solution. But the argument above isn’t very strong
IMHO.
>
> And a non-common case will have to be implemented in the driver
> anyways, so we'd have to allow for that.
>
>
>
> P.