On Fri, Feb 14, 2025 at 12:25:59AM +0100, Frederic Weisbecker wrote: > A CPU coming online checks for an ongoing grace period and reports > a quiescent state accordingly if needed. This special treatment that > shortcuts the expedited IPI finds its origin as an optimization purpose > on the following commit: > > 338b0f760e84 (rcu: Better hotplug handling for > synchronize_sched_expedited() > > The point is to avoid an IPI while waiting for a CPU to become online > or failing to become offline. > > However this is pointless and even error prone for several reasons: > > * If the CPU has been seen offline in the first round scanning offline > and idle CPUs, no IPI is even tried and the quiescent state is > reported on behalf of the CPU. > > * This means that if the IPI fails, the CPU just became offline. So > it's unlikely to become online right away, unless the cpu hotplug > operation failed and rolled back, which is a rare event that can > wait a jiffy for a new IPI to be issued. > > * But then the "optimization" applying on failing CPU hotplug down only > applies to !PREEMPT_RCU. > > * This force reports a quiescent state even if ->cpu_no_qs.b.exp is not > set. As a result it can race with remote QS reports on the same rdp. > Fortunately it happens to be OK but an accident is waiting to happen. > > For all those reasons, remove this optimization that doesn't look worthy > to keep around.
Thank you for digging into this! When I ran tests that removed the call to sync_sched_exp_online_cleanup() a few months ago, I got grace-period hangs [1]. Has something changed to make this safe? Thanx, Paul [1] https://docs.google.com/document/d/1-JQ4QYF1qid0TWSLa76O1kusdhER2wgm0dYdwFRUzU8/edit?usp=sharing > Reported-by: Paul E. McKenney <paul...@kernel.org> > Signed-off-by: Frederic Weisbecker <frede...@kernel.org> > --- > kernel/rcu/tree.c | 2 -- > kernel/rcu/tree_exp.h | 45 ++----------------------------------------- > 2 files changed, 2 insertions(+), 45 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 8625f616c65a..86935fe00397 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -151,7 +151,6 @@ static void rcu_report_qs_rnp(unsigned long mask, struct > rcu_node *rnp, > unsigned long gps, unsigned long flags); > static void invoke_rcu_core(void); > static void rcu_report_exp_rdp(struct rcu_data *rdp); > -static void sync_sched_exp_online_cleanup(int cpu); > static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp); > static bool rcu_rdp_is_offloaded(struct rcu_data *rdp); > static bool rcu_rdp_cpu_online(struct rcu_data *rdp); > @@ -4259,7 +4258,6 @@ int rcutree_online_cpu(unsigned int cpu) > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > return 0; /* Too early in boot for scheduler work. */ > - sync_sched_exp_online_cleanup(cpu); > > // Stop-machine done, so allow nohz_full to disable tick. > tick_dep_clear(TICK_DEP_BIT_RCU); > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index caff16e441d1..a3f962eb7057 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -751,12 +751,8 @@ static void rcu_exp_handler(void *unused) > struct task_struct *t = current; > > /* > - * First, is there no need for a quiescent state from this CPU, > - * or is this CPU already looking for a quiescent state for the > - * current grace period? If either is the case, just leave. > - * However, this should not happen due to the preemptible > - * sync_sched_exp_online_cleanup() implementation being a no-op, > - * so warn if this does happen. > + * WARN if the CPU is unexpectedly already looking for a > + * QS or has already reported one. > */ > ASSERT_EXCLUSIVE_WRITER_SCOPED(rdp->cpu_no_qs.b.exp); > if (WARN_ON_ONCE(!(READ_ONCE(rnp->expmask) & rdp->grpmask) || > @@ -803,11 +799,6 @@ static void rcu_exp_handler(void *unused) > WARN_ON_ONCE(1); > } > > -/* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up > after. */ > -static void sync_sched_exp_online_cleanup(int cpu) > -{ > -} > - > /* > * Scan the current list of tasks blocked within RCU read-side critical > * sections, printing out the tid of each that is blocking the current > @@ -885,38 +876,6 @@ static void rcu_exp_handler(void *unused) > rcu_exp_need_qs(); > } > > -/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operation. > */ > -static void sync_sched_exp_online_cleanup(int cpu) > -{ > - unsigned long flags; > - int my_cpu; > - struct rcu_data *rdp; > - int ret; > - struct rcu_node *rnp; > - > - rdp = per_cpu_ptr(&rcu_data, cpu); > - rnp = rdp->mynode; > - my_cpu = get_cpu(); > - /* Quiescent state either not needed or already requested, leave. */ > - if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || > - READ_ONCE(rdp->cpu_no_qs.b.exp)) { > - put_cpu(); > - return; > - } > - /* Quiescent state needed on current CPU, so set it up locally. */ > - if (my_cpu == cpu) { > - local_irq_save(flags); > - rcu_exp_need_qs(); > - local_irq_restore(flags); > - put_cpu(); > - return; > - } > - /* Quiescent state needed on some other CPU, send IPI. */ > - ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0); > - put_cpu(); > - WARN_ON_ONCE(ret); > -} > - > /* > * Because preemptible RCU does not exist, we never have to check for > * tasks blocked within RCU read-side critical sections that are > -- > 2.46.0 >