On Fri, Mar 14, 2025 at 03:36:42PM +0100, Frederic Weisbecker wrote: > A CPU within hotplug operations can make the RCU exp kworker lagging if: > > * The dying CPU is running after CPUHP_TEARDOWN_CPU but before > rcutree_report_cpu_dead(). It is too late to send an IPI but RCU is > still watching the CPU. Therefore the exp kworker can only wait for > the target to reach rcutree_report_cpu_dead(). > > * The booting CPU is running after rcutree_report_cpu_starting() but > before set_cpu_online(). RCU is watching the CPU but it is too early > to be able to send an IPI. Therefore the exp kworker can only wait > until it observes the CPU as officially online. > > Such a lag is expected to be very short. However #VMEXIT and other > hazards can stay on the way. Report long delays, 10 jiffies is > considered a high threshold already. > > Reported-by: Paul E. McKenney <paul...@kernel.org> > Signed-off-by: Frederic Weisbecker <frede...@kernel.org>
Same CONFIG_PROVE_RCU question, same conditional: Reviewed-by: Paul E. McKenney <paul...@kernel.org> > --- > kernel/rcu/tree_exp.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 6058a734090c..87a44423927d 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -406,8 +406,18 @@ static void __sync_rcu_exp_select_node_cpus(struct > rcu_exp_work *rewp) > for_each_leaf_node_cpu_mask(rnp, cpu, mask_ofl_ipi) { > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > unsigned long mask = rdp->grpmask; > + int nr_retries = 0; > > retry_ipi: > + /* > + * In case of retrying, CPU either is lagging: > + * > + * - between CPUHP_TEARDOWN_CPU and rcutree_report_cpu_dead() > + * or: > + * - between rcutree_report_cpu_starting() and set_cpu_online() > + */ > + WARN_ON_ONCE(nr_retries++ > 10); > + > if (rcu_watching_snap_stopped_since(rdp, > rdp->exp_watching_snap)) { > mask_ofl_test |= mask; > continue; > -- > 2.48.1 >