On 19.08.25 11:04, Janusz Krzysztofik wrote: > On Monday, 18 August 2025 16:42:56 CEST Christian König wrote: >> On 18.08.25 16:30, Janusz Krzysztofik wrote: >>> Hi Christian, >>> >>> On Thursday, 14 August 2025 14:24:29 CEST Christian König wrote: >>>> >>>> On 14.08.25 10:16, Janusz Krzysztofik wrote: >>>>> When first user starts waiting on a not yet signaled fence of a chain >>>>> link, a dma_fence_chain callback is added to a user fence of that link. >>>>> When the user fence of that chain link is then signaled, the chain is >>>>> traversed in search for a first not signaled link and the callback is >>>>> rearmed on a user fence of that link. >>>>> >>>>> Since chain fences may be exposed to user space, e.g. over drm_syncobj >>>>> IOCTLs, users may start waiting on any link of the chain, then many links >>>>> of a chain may have signaling enabled and their callbacks added to their >>>>> user fences. Once an arbitrary user fence is signaled, all >>>>> dma_fence_chain callbacks added to it so far must be rearmed to another >>>>> user fence of the chain. In extreme scenarios, when all N links of a >>>>> chain are awaited and then signaled in reverse order, the dma_fence_chain >>>>> callback may be called up to N * (N + 1) / 2 times (an arithmetic series). >>>>> >>>>> To avoid that potential excessive accumulation of dma_fence_chain >>>>> callbacks, rearm a trimmed-down, signal only callback version to the base >>>>> fence of a previous link, if not yet signaled, otherwise just signal the >>>>> base fence of the current link instead of traversing the chain in search >>>>> for a first not signaled link and moving all callbacks collected so far to >>>>> a user fence of that link. >>>> >>>> Well clear NAK to that! You can easily overflow the kernel stack with that! >>> >>> I'll be happy to propose a better solution, but for that I need to >>> understand >>> better your message. Could you please point out an exact piece of the >>> proposed code and/or describe a scenario where you can see the risk of >>> stack >>> overflow? >> >> The sentence "rearm .. to the base fence of a previous link" sounds like you >> are trying to install a callback on the signaling to the previous chain >> element. >> >> That is exactly what I pointed out previously where you need to be super >> careful because when this chain signals the callbacks will execute >> recursively which means that you can trivially overflow the kernel stack if >> you have more than a handful of chain elements. >> >> In other words A waits for B, B waits for C, C waits for D etc.... when D >> finally signals it will call C which in turn calls B which in turn calls A. > > OK, maybe my commit description was not precise enough, however, I didn't > describe implementation details (how) intentionally. > When D signals then it doesn't call C directly, only it submits an irq work > that calls C. Then C doesn't just call B, only it submits another irq work > that calls B, and so on. > Doesn't that code pattern effectively break the recursion loop into separate > work items, each with its own separate stack?
No, it's architecture dependent if the irq_work executes on a separate stack or not. You would need a work_struct to really separate the two and I would reject that because it adds additional latency to the signaling path. >> >> Even if the chain is a recursive data structure you absolutely can't use >> recursion for the handling of it. >> >> Maybe I misunderstood your textual description but reading a sentence like >> this rings all alarm bells here. Otherwise I can't see what the patch is >> supposed to be optimizing. > > OK, maybe I should start my commit description of this patch with a copy of > the first sentence from cover letter and also from patch 1/4 description that > informs about the problem as reported by CI. Maybe I should also provide a > comparison of measured signaling times from trybot executions [1][2][3]. > Here are example numbers from CI machine fi-bsw-n3050: Yeah and I've pointed out before that this is irrelevant. The problem is *not* the dma_fence_chain implementation, that one is doing exactly what is expected to do. The problem is that the test case is nonsense. I've pointed that out numerous times now! Regards, Christian. > > With signaling time reports only added to selftests (patch 1 of 4): > <6> [777.914451] dma-buf: Running dma_fence_chain/wait_forward > <6> [778.123516] wait_forward: 4096 signals in 21373487 ns > <6> [778.335709] dma-buf: Running dma_fence_chain/wait_backward > <6> [795.791546] wait_backward: 4096 signals in 17249051192 ns > <6> [795.859699] dma-buf: Running dma_fence_chain/wait_random > <6> [796.161375] wait_random: 4096 signals in 97386256 ns > > With dma_fence_enable_signaling() replaced in selftests with dma_fence_wait() > (patches 1-3 of 4): > <6> [782.505692] dma-buf: Running dma_fence_chain/wait_forward > <6> [784.609213] wait_forward: 4096 signals in 36513103 ns > <3> [784.837226] Reported -4 for kthread_stop_put(0)! > <6> [785.147643] dma-buf: Running dma_fence_chain/wait_backward > <6> [806.367763] wait_backward: 4096 signals in 18428009499 ns > <6> [807.175325] dma-buf: Running dma_fence_chain/wait_random > <6> [809.453942] wait_random: 4096 signals in 119761950 ns > > With the fix (patches 1-4 of 4): > <6> [731.519020] dma-buf: Running dma_fence_chain/wait_forward > <6> [733.623375] wait_forward: 4096 signals in 31890220 ns > <6> [734.258972] dma-buf: Running dma_fence_chain/wait_backward > <6> [736.267325] wait_backward: 4096 signals in 39007955 ns > <6> [736.700221] dma-buf: Running dma_fence_chain/wait_random > <6> [739.346706] wait_random: 4096 signals in 48384865 ns > > Signaling time in wait_backward selftest has been reduced from 17s to 39ms. > > [1] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152785v1/index.html? > [2] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152828v2/index.html? > [3] https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_152830v2/index.html? > >> >>>> >>>> Additional to this messing with the fence ops outside of the dma_fence >>>> code is an absolute no-go. >>> >>> Could you please explain what piece of code you are referring to when you >>> say >>> "messing with the fence ops outside the dma_fence code"? If not this patch >>> then which particular one of this series did you mean? I'm assuming you >>> didn't mean drm_syncobj code that I mentioned in my commit descriptions. >> >> See below. >> >>> >>> Thanks, >>> Janusz >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904 >>>>> Suggested-by: Chris Wilson <chris.p.wil...@linux.intel.com> >>>>> Signed-off-by: Janusz Krzysztofik <janusz.krzyszto...@linux.intel.com> >>>>> --- >>>>> drivers/dma-buf/dma-fence-chain.c | 101 +++++++++++++++++++++++++----- >>>>> 1 file changed, 84 insertions(+), 17 deletions(-) >>>>> >>>>> diff --git a/drivers/dma-buf/dma-fence-chain.c >>>>> b/drivers/dma-buf/dma-fence-chain.c >>>>> index a8a90acf4f34d..90eff264ee05c 100644 >>>>> --- a/drivers/dma-buf/dma-fence-chain.c >>>>> +++ b/drivers/dma-buf/dma-fence-chain.c >>>>> @@ -119,46 +119,113 @@ static const char >>>>> *dma_fence_chain_get_timeline_name(struct dma_fence *fence) >>>>> return "unbound"; >>>>> } >>>>> >>>>> -static void dma_fence_chain_irq_work(struct irq_work *work) >>>>> +static void signal_irq_work(struct irq_work *work) >>>>> { >>>>> struct dma_fence_chain *chain; >>>>> >>>>> chain = container_of(work, typeof(*chain), work); >>>>> >>>>> - /* Try to rearm the callback */ >>>>> - if (!dma_fence_chain_enable_signaling(&chain->base)) >>>>> - /* Ok, we are done. No more unsignaled fences left */ >>>>> - dma_fence_signal(&chain->base); >>>>> + dma_fence_signal(&chain->base); >>>>> dma_fence_put(&chain->base); >>>>> } >>>>> >>>>> -static void dma_fence_chain_cb(struct dma_fence *f, struct dma_fence_cb >>>>> *cb) >>>>> +static void signal_cb(struct dma_fence *f, struct dma_fence_cb *cb) >>>>> +{ >>>>> + struct dma_fence_chain *chain; >>>>> + >>>>> + chain = container_of(cb, typeof(*chain), cb); >>>>> + init_irq_work(&chain->work, signal_irq_work); >>>>> + irq_work_queue(&chain->work); >>>>> +} >>>>> + >>>>> +static void rearm_irq_work(struct irq_work *work) >>>>> +{ >>>>> + struct dma_fence_chain *chain; >>>>> + struct dma_fence *prev; >>>>> + >>>>> + chain = container_of(work, typeof(*chain), work); >>>>> + >>>>> + rcu_read_lock(); >>>>> + prev = rcu_dereference(chain->prev); >>>>> + if (prev && dma_fence_add_callback(prev, &chain->cb, signal_cb)) >>>>> + prev = NULL; >>>>> + rcu_read_unlock(); >>>>> + if (prev) >>>>> + return; >>>>> + >>>>> + /* Ok, we are done. No more unsignaled fences left */ >>>>> + signal_irq_work(work); >>>>> +} >>>>> + >>>>> +static inline bool fence_is_signaled__nested(struct dma_fence *fence) >>>>> +{ >>>>> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) >>>>> + return true; >>>>> + >> >>>>> + if (fence->ops->signaled && fence->ops->signaled(fence)) { >> >> Calling this outside of dma-fence.[ch] is a clear no-go. > > But this patch applies only to drivers/dma-buf/dma-fence-chain.c, not > outside of it. > > Thanks, > Janusz > >> >> Regards, >> Christian. >> >>>>> + unsigned long flags; >>>>> + >>>>> + spin_lock_irqsave_nested(fence->lock, flags, >>>>> SINGLE_DEPTH_NESTING); >>>>> + dma_fence_signal_locked(fence); >>>>> + spin_unlock_irqrestore(fence->lock, flags); >>>>> + >>>>> + return true; >>>>> + } >>>>> + >>>>> + return false; >>>>> +} >>>>> + >>>>> +static bool prev_is_signaled(struct dma_fence_chain *chain) >>>>> +{ >>>>> + struct dma_fence *prev; >>>>> + bool result; >>>>> + >>>>> + rcu_read_lock(); >>>>> + prev = rcu_dereference(chain->prev); >>>>> + result = !prev || fence_is_signaled__nested(prev); >>>>> + rcu_read_unlock(); >>>>> + >>>>> + return result; >>>>> +} >>>>> + >>>>> +static void rearm_or_signal_cb(struct dma_fence *f, struct dma_fence_cb >>>>> *cb) >>>>> { >>>>> struct dma_fence_chain *chain; >>>>> >>>>> chain = container_of(cb, typeof(*chain), cb); >>>>> - init_irq_work(&chain->work, dma_fence_chain_irq_work); >>>>> + if (prev_is_signaled(chain)) { >>>>> + /* Ok, we are done. No more unsignaled fences left */ >>>>> + init_irq_work(&chain->work, signal_irq_work); >>>>> + } else { >>>>> + /* Try to rearm the callback */ >>>>> + init_irq_work(&chain->work, rearm_irq_work); >>>>> + } >>>>> + >>>>> irq_work_queue(&chain->work); >>>>> - dma_fence_put(f); >>>>> } >>>>> >>>>> static bool dma_fence_chain_enable_signaling(struct dma_fence *fence) >>>>> { >>>>> struct dma_fence_chain *head = to_dma_fence_chain(fence); >>>>> + int err = -ENOENT; >>>>> >>>>> - dma_fence_get(&head->base); >>>>> - dma_fence_chain_for_each(fence, &head->base) { >>>>> - struct dma_fence *f = dma_fence_chain_contained(fence); >>>>> + if (WARN_ON(!head)) >>>>> + return false; >>>>> >>>>> - dma_fence_get(f); >>>>> - if (!dma_fence_add_callback(f, &head->cb, dma_fence_chain_cb)) { >>>>> + dma_fence_get(fence); >>>>> + if (head->fence) >>>>> + err = dma_fence_add_callback(head->fence, &head->cb, >>>>> rearm_or_signal_cb); >>>>> + if (err) { >>>>> + if (prev_is_signaled(head)) { >>>>> dma_fence_put(fence); >>>>> - return true; >>>>> + } else { >>>>> + init_irq_work(&head->work, rearm_irq_work); >>>>> + irq_work_queue(&head->work); >>>>> + err = 0; >>>>> } >>>>> - dma_fence_put(f); >>>>> } >>>>> - dma_fence_put(&head->base); >>>>> - return false; >>>>> + >>>>> + return !err; >>>>> } >>>>> >>>>> static bool dma_fence_chain_signaled(struct dma_fence *fence) >>>> >>>> >>> >>> >>> >>> >> >> > > > >