On Fri, Jan 16, 2026 at 11:42:28PM +0100, Frederic Weisbecker wrote:
> Le Wed, Jan 14, 2026 at 12:31:53PM -0500, Joel Fernandes a écrit :
> > During callback overload, the NOCB code attempts an opportunistic
> > advancement via rcu_advance_cbs_nowake().
> > 
> > Analysis via tracing with 300,000 callbacks flooded shows this
> > optimization is likely dead code:
> > - 30 overload conditions triggered
> > - 0 advancements actually occurred
> > - 100% of time no advancement due to current GP not done.
> > 
> > I also ran TREE05 and TREE08 for 2 hours and cannot trigger it.
> > 
> > When callbacks overflow (exceed qhimark), they are waiting for a grace
> > period that hasn't completed yet. The optimization requires the GP to be
> > complete to advance callbacks, but the overload condition itself is
> > caused by callbacks piling up faster than GPs can complete. This creates
> > a logical contradiction where the advancement cannot happen.
> > 
> > In *theory* this might be possible, the GP completed just in the nick of
> > time as we hit the overload, but this is just so rare that it can be
> > considered impossible when we cannot even hit it with synthetic callback
> > flooding even, it is a waste of cycles to even try to advance, let alone
> > be useful and is a maintenance burden complexity we don't need.
> 
> Rare is far from impossible with billions of android devices living out there.
> 
> I can imagine the warning to just hit if the flooding callback enqueuer happen
> to hit the qhimark right after the GP has completed but before nocb_gp_wait()
> managed yet to advance the callbacks.
> 
> But what would that prove then?

I agree with you. I think the original goal of this code path was to help
nocb_gp_wait() by doing some of the advancement work early when we already
know we're in an overload situation. But in all my testing, the cblist is
already advanced by the time we get here - making this path pointless as
you noted.

> > I suggest deletion. However, add a WARN_ON_ONCE for a merge window or 2
> > and delete it after out of extreme caution.
> 
> 2 merge windows is the least of time for that warning to ever land on the 
> billions
> machines. My phone still runs a v5.4 kernel :-)
> 
> And the patch doesn't quite qualify for a stable backport.
> 
> Anyway, consider an unpleasant case where nocb_gp_wait() is starving for
> example. How would just advancing the callbacks help? We still need
> nocb_gp_wait() to run its round to eventually wake up nocb_cb_wait()
> so that the done callbacks are executed. And before doing that, it needs
> to advance the callbacks anyway...
> 
> I'm personally in favour of removing this right away instead, unless Paul
> has a good reason that I missed?

Agreed. Unless I hear otherwise from others, I will delete it in my respin.
I also agree that it is better to delete it sooner rather than later -
waiting for a merge window or two buys us very little given the latency of
kernel release ending in product.

Thanks,

 - Joel

Reply via email to