On Fri, Feb 14, 2025 at 01:10:52PM +0100, Frederic Weisbecker wrote: > Le Fri, Feb 14, 2025 at 01:01:56AM -0800, Paul E. McKenney a écrit : > > On Fri, Feb 14, 2025 at 12:25:59AM +0100, Frederic Weisbecker wrote: > > > A CPU coming online checks for an ongoing grace period and reports > > > a quiescent state accordingly if needed. This special treatment that > > > shortcuts the expedited IPI finds its origin as an optimization purpose > > > on the following commit: > > > > > > 338b0f760e84 (rcu: Better hotplug handling for > > > synchronize_sched_expedited() > > > > > > The point is to avoid an IPI while waiting for a CPU to become online > > > or failing to become offline. > > > > > > However this is pointless and even error prone for several reasons: > > > > > > * If the CPU has been seen offline in the first round scanning offline > > > and idle CPUs, no IPI is even tried and the quiescent state is > > > reported on behalf of the CPU. > > > > > > * This means that if the IPI fails, the CPU just became offline. So > > > it's unlikely to become online right away, unless the cpu hotplug > > > operation failed and rolled back, which is a rare event that can > > > wait a jiffy for a new IPI to be issued.
But the expedited grace period might be preempted for an arbitrarily long period, especially if a hypervisor is in play. And we do drop that lock midway through... > > > * But then the "optimization" applying on failing CPU hotplug down only > > > applies to !PREEMPT_RCU. Yes, definitely only non-preemptible RCU. > > > * This force reports a quiescent state even if ->cpu_no_qs.b.exp is not > > > set. As a result it can race with remote QS reports on the same rdp. > > > Fortunately it happens to be OK but an accident is waiting to happen. To your point, I did in fact incorrectly decide that this was a bug. ;-) > > > For all those reasons, remove this optimization that doesn't look worthy > > > to keep around. > > > > Thank you for digging into this! > > > > When I ran tests that removed the call to sync_sched_exp_online_cleanup() > > a few months ago, I got grace-period hangs [1]. Has something changed > > to make this safe? > > Hmm, but was it before or after "rcu: Fix get_state_synchronize_rcu_full() > GP-start detection" ? Before. There was also some buggy debug code in play. Also, to get the failure, it was necessary to make TREE03 disable preemption, as stock TREE03 has an empty sync_sched_exp_online_cleanup() function. I am rerunning the test with a WARN_ON_ONCE() after the early exit from the sync_sched_exp_online_cleanup(). Of course, lack of a failure does not necessairly indicate > And if after do we know why? Here are some (possibly bogus) possibilities that came to mind: 1. There is some coming-online race that deprives the incoming CPU of an IPI, but nevertheless marks that CPU as blocking the current grace period. 2. Some strange scenario involves the CPU going offline for just a little bit, so that the IPI gets wasted on the outgoing due to neither of the "if" conditions in rcu_exp_handler() being true. The outgoing CPU just says "I need a QS", then leaves and comes back. (The expedited grace period doesn't retry because it believes that it already sent that IPI.) 3. Your ideas here! ;-) Thanx, Paul