On Wed, Mar 19, 2025 at 10:42:38AM +0100, Frederic Weisbecker wrote: > Le Tue, Mar 18, 2025 at 10:22:33AM -0700, Paul E. McKenney a écrit : > > On Fri, Mar 14, 2025 at 03:36:42PM +0100, Frederic Weisbecker wrote: > > > A CPU within hotplug operations can make the RCU exp kworker lagging if: > > > > > > * The dying CPU is running after CPUHP_TEARDOWN_CPU but before > > > rcutree_report_cpu_dead(). It is too late to send an IPI but RCU is > > > still watching the CPU. Therefore the exp kworker can only wait for > > > the target to reach rcutree_report_cpu_dead(). > > > > > > * The booting CPU is running after rcutree_report_cpu_starting() but > > > before set_cpu_online(). RCU is watching the CPU but it is too early > > > to be able to send an IPI. Therefore the exp kworker can only wait > > > until it observes the CPU as officially online. > > > > > > Such a lag is expected to be very short. However #VMEXIT and other > > > hazards can stay on the way. Report long delays, 10 jiffies is > > > considered a high threshold already. > > > > > > Reported-by: Paul E. McKenney <paul...@kernel.org> > > > Signed-off-by: Frederic Weisbecker <frede...@kernel.org> > > > > Same CONFIG_PROVE_RCU question, same conditional: > > > > Reviewed-by: Paul E. McKenney <paul...@kernel.org> > > I don't have a strong opinion whether to keep this warning unconditional. > Perhaps this can depend on CONFIG_PROVE_RCU.
You are the expert on that question, so your choice. On this one, I am but asking the questions. ;-) Thanx, Paul > Thanks. > > > > > > --- > > > kernel/rcu/tree_exp.h | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > index 6058a734090c..87a44423927d 100644 > > > --- a/kernel/rcu/tree_exp.h > > > +++ b/kernel/rcu/tree_exp.h > > > @@ -406,8 +406,18 @@ static void __sync_rcu_exp_select_node_cpus(struct > > > rcu_exp_work *rewp) > > > for_each_leaf_node_cpu_mask(rnp, cpu, mask_ofl_ipi) { > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); > > > unsigned long mask = rdp->grpmask; > > > + int nr_retries = 0; > > > > > > retry_ipi: > > > + /* > > > + * In case of retrying, CPU either is lagging: > > > + * > > > + * - between CPUHP_TEARDOWN_CPU and rcutree_report_cpu_dead() > > > + * or: > > > + * - between rcutree_report_cpu_starting() and set_cpu_online() > > > + */ > > > + WARN_ON_ONCE(nr_retries++ > 10); > > > + > > > if (rcu_watching_snap_stopped_since(rdp, > > > rdp->exp_watching_snap)) { > > > mask_ofl_test |= mask; > > > continue; > > > -- > > > 2.48.1 > > >