On 21/08/2025 13:46, Christian König wrote:
On 13.08.25 13:59, Tvrtko Ursulin wrote:
Good point. Could be that someone is using a pure kernel thread for fence 
signaling. Going to check for that instead of a worker.

Ok, I am curious how to do it reliably. Non-null current and PF_KTHREAD and 
PF_WQ_WORKER not set?

If I'm not completely mistaken (current->flags & PF_KTHREAD) should do it, but 
I need to double check that as well.

Even the fact opportunistic signalling needs to bypass the assert makes it 
sound like there isn't anything fundamentally wrong with signalling from task 
context.


Opportunistic signaling can happen from everywhere. But when an implementation 
tries to signal it from an IOCTL that is certainly invalid.

The first patch also feels a bit too much if it has no purpose apart from 
checking the new asserts would otherwise trigger.

The sw_sync code is is only used for testing and debugging. See the Kconfig of 
it:

            A sync object driver that uses a 32bit counter to coordinate
            synchronization.  Useful when there is no hardware primitive backing
            the synchronization.

            WARNING: improper use of this can result in deadlocking kernel
            drivers from userspace. Intended for test and debug only.


But it is adding kernel code for a questionable benefit.

The whole sw_sync is questionable to begin with.

We had that for a couple of years already until we finally realized the problem 
with the infinite fences.

Today it should only be used for unit testing.

What about calling the non-asserting version instead of adding complexity? It 
is kernel code so should be fine.

That would give other implementations both the possibility and impression that 
this is ok. And that is something I would really like to avoid.

I think a "big fat" comment next to the (only?) caller should be enough and on balance better than complicating the code for a debug component, which even adds an artificial/misleading limitation. I mean a complication plus limitation, with the sole purpose of making the assert feasible does not sound like the best option.

Perhaps a patch to checkpatch too to flag "no new users of this api should be added"? I wonder if there is a generic way to tie checkpatch with a list of deprecated API and if not should one be proposed.

Regards,

Tvrtko

Because even with the worker sw_sync can still create infinite fences. Worker 
just defeats the asserts so I do not see the value in complicating it.

Yeah, I mean completely ripping out the sw_sync would be indeed better. But 
that would break existing unit tests.

Regards,
Christian.


Regards,

Tvrtko


Signed-off-by: Christian König <ckoe...@able.fritz.box>
---
    drivers/dma-buf/dma-fence.c | 59 +++++++++++++++++++++++++++----------
    include/linux/dma-fence.h   |  9 ++++--
    2 files changed, 51 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 3f78c56b58dc..2bce620eacac 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -345,33 +345,23 @@ void __dma_fence_might_wait(void)
    }
    #endif
    -
    /**
- * dma_fence_signal_timestamp_locked - signal completion of a fence
+ * dma_fence_signal_internal - internal signal completion of a fence
     * @fence: the fence to signal
     * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time 
domain
     *
- * Signal completion for software callbacks on a fence, this will unblock
- * dma_fence_wait() calls and run all the callbacks added with
- * dma_fence_add_callback(). Can be called multiple times, but since a fence
- * can only go from the unsignaled to the signaled state and not back, it will
- * only be effective the first time. Set the timestamp provided as the fence
- * signal timestamp.
- *
- * Unlike dma_fence_signal_timestamp(), this function must be called with
- * &dma_fence.lock held.
+ * Internal signal the dma_fence without error checking. Should *NEVER* be used
+ * by drivers or external code directly.
     *
     * Returns 0 on success and a negative error value when @fence has been
     * signalled already.
     */
-int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
-                      ktime_t timestamp)
+int dma_fence_signal_internal(struct dma_fence *fence, ktime_t timestamp)
    {
        struct dma_fence_cb *cur, *tmp;
        struct list_head cb_list;
          lockdep_assert_held(fence->lock);
-
        if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
                          &fence->flags)))
            return -EINVAL;
@@ -390,7 +380,46 @@ int dma_fence_signal_timestamp_locked(struct dma_fence 
*fence,
          return 0;
    }
-EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
+EXPORT_SYMBOL(dma_fence_signal_internal);
+
+/**
+ * dma_fence_signal_timestamp_locked - signal completion of a fence
+ * @fence: the fence to signal
+ * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time domain
+ *
+ * Signal completion for software callbacks on a fence, this will unblock
+ * dma_fence_wait() calls and run all the callbacks added with
+ * dma_fence_add_callback(). Can be called multiple times, but since a fence
+ * can only go from the unsignaled to the signaled state and not back, it will
+ * only be effective the first time. Set the timestamp provided as the fence
+ * signal timestamp.
+ *
+ * Unlike dma_fence_signal_timestamp(), this function must be called with
+ * &dma_fence.lock held.
+ *
+ * Returns 0 on success and a negative error value when @fence has been
+ * signalled already.
+ */
+int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
+                      ktime_t timestamp)
+{
+    /*
+     * We have the re-occurring problem that people try to invent a
+     * DMA-fences implementation which signals fences based on an userspace
+     * IOCTL.
+     *
+     * This is well known as source of hard to track down crashes and is
+     * documented to be an invalid approach. The problem is that it seems
+     * to work during initial testing and only long term tests points out
+     * why this can never work correctly.
+     *
+     * So give at least a warning when people try to signal a fence from
+     * task context and not from interrupts or a work item. This check is
+     * certainly not perfect but better than nothing.
+     */
+    WARN_ON_ONCE(!in_interrupt() && !current_work());
+    return dma_fence_signal_internal(fence, timestamp);
+}
      /**
     * dma_fence_signal_timestamp - signal completion of a fence
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 64639e104110..8dbcd66989b8 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -369,6 +369,7 @@ int dma_fence_signal_locked(struct dma_fence *fence);
    int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp);
    int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
                          ktime_t timestamp);
+int dma_fence_signal_internal(struct dma_fence *fence, ktime_t timestamp);
    signed long dma_fence_default_wait(struct dma_fence *fence,
                       bool intr, signed long timeout);
    int dma_fence_add_callback(struct dma_fence *fence,
@@ -422,7 +423,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
            return true;
          if (fence->ops->signaled && fence->ops->signaled(fence)) {
-        dma_fence_signal_locked(fence);
+        dma_fence_signal_internal(fence, ktime_get());
            return true;
        }
    @@ -448,11 +449,15 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
    static inline bool
    dma_fence_is_signaled(struct dma_fence *fence)
    {
+    unsigned long flags;
+
        if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
            return true;
          if (fence->ops->signaled && fence->ops->signaled(fence)) {
-        dma_fence_signal(fence);
+        spin_lock_irqsave(fence->lock, flags);
+        dma_fence_signal_internal(fence, ktime_get());
+        spin_unlock_irqrestore(fence->lock, flags);
            return true;
        }





Reply via email to