On Fri, Feb 14, 2025 at 12:25:57AM +0100, Frederic Weisbecker wrote: > When a grace period is started, the ->expmask of each node is set up > from sync_exp_reset_tree(). Then later on each leaf node also initialize > its ->exp_tasks pointer. > > This means that the initialization of the quiescent state of a node and > the initialization of its blocking tasks happen with an unlocked node > gap in-between. > > It happens to be fine because nothing is expected to report an exp > quiescent state within this gap, since no IPI have been issued yet and > every rdp's ->cpu_no_qs.b.exp should be false. > > However if it were to happen by accident, the quiescent state could be > reported and propagated while ignoring tasks that blocked _before_ the > start of the grace period. > > Prevent such trouble to happen in the future and initialize both the > quiescent states mask to report and the blocked tasks head from the same > node locked block. > > Signed-off-by: Frederic Weisbecker <frede...@kernel.org>
Thank you for looking into this! One question: What happens if a CPU has tasks pending during the call to sync_exp_reset_tree(), but all of these tasks complete their RCU read-side critical sections before execution reaches __sync_rcu_exp_select_node_cpus()? (My guess is that all is well, but even if so, it would be good to record why in the commit log.) Thanx, Paul > --- > kernel/rcu/tree_exp.h | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 8d4895c854c5..caff16e441d1 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -141,6 +141,13 @@ static void __maybe_unused sync_exp_reset_tree(void) > raw_spin_lock_irqsave_rcu_node(rnp, flags); > WARN_ON_ONCE(rnp->expmask); > WRITE_ONCE(rnp->expmask, rnp->expmaskinit); > + /* > + * Need to wait for any blocked tasks as well. Note that > + * additional blocking tasks will also block the expedited GP > + * until such time as the ->expmask bits are cleared. > + */ > + if (rcu_is_leaf_node(rnp) && rcu_preempt_has_tasks(rnp)) > + WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next); > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > } > } > @@ -393,13 +400,6 @@ static void __sync_rcu_exp_select_node_cpus(struct > rcu_exp_work *rewp) > } > mask_ofl_ipi = rnp->expmask & ~mask_ofl_test; > > - /* > - * Need to wait for any blocked tasks as well. Note that > - * additional blocking tasks will also block the expedited GP > - * until such time as the ->expmask bits are cleared. > - */ > - if (rcu_preempt_has_tasks(rnp)) > - WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next); > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > /* IPI the remaining CPUs for expedited quiescent state. */ > -- > 2.46.0 >