Le Tue, Apr 01, 2025 at 12:30:40PM -0400, Joel Fernandes a écrit : > Hello, Frederic, > > On Tue, 1 Apr 2025 16:27:36 GMT, Frederic Weisbecker wrote: > > Le Mon, Mar 31, 2025 at 02:29:52PM -0700, Paul E. McKenney a écrit : > > > The disagreement is a feature, at least up to a point. That feature > > > allows CPUs to go idle for long periods without RCU having to bother > > > them or to mess with their per-CPU data (give or take ->gpwrap). It also > > > allows per-rcu_node-leaf locking, which is important on large systems. > > > > > > Trying to make precisely globally agreed-on beginnings and ends of > > > RCU grace periods will not end well from performance, scalability, > > > or real-time-response viewpoints. ;-) > > > > The distributed disagreement is definetly a feature. The duplicate root > > is more debatable. > > > > > But simplifications that don't hurt performance, scalability, and > > > real-time-response are of course welcome. > > > > I'm not even sure my proposal is a simplification. Perhaps it is. Another > > hope is that it could avoid future accidents. > > > > Aside from the performance concerns: > > Sorry if this is silly but could you provide a small hint as to how > unifying the global counter with the node affects QS reporting or hotplug? > It was not immediately obvious to me. Thanks for the help.
First of all rcu_seq_start() must be before the hotplug scan, otherwise you run into this: rcu_state.gp_seq = 4 CPU0/ rcu_gp_kthread() CPU 1 CPU 2 ------------- ---------- ----------- //rcu_gp_init() rcu_for_each_leaf_node(rnp) { raw_spin_lock_rcu_node(rnp); rnp->qsmaskinit = rnp->qsmaskinitnext raw_spin_unlock_rcu_node(rnp); } rcutree_report_cpu_starting() raw_spin_lock_rcu_node(rnp); rnp->qsmaskinitnext |= rdp->grpmask raw_spin_unlock_rcu_node(rnp); rcu_read_lock() r0 = *X r1 = *X X = NULL cookie = get_state_sychronize_rcu() //cookie = 8 rcu_seq_start(&rcu_state.gp_seq) //rcu_state.gp_seq == 5 rcu_for_each_node_breadth_first(rnp) { raw_spin_lock_rcu_node(rnp); // Ignore CPU 1 rnp->qsmask = rnp->qsmaskinit; raw_spin_unlock_rcu_node(rnp); } [...] //rcu_gp_cleanup() rcu_seq_end(&rcu_state.gp_seq) // rcu_state.gp_seq == 8 poll_state_sychronize_rcu(cookie) kfree(r1) r2 = *r0 // CRASH So the same applies if we convert rcu_state to use the root node. But if we do rcu_seq_start() on the root node, then an update side can call note_gp_changes() because of the state change (only if the root node is also the unique leaf). But this then happens before the loop that initializes all the ->qsmask It's not a correctness problem because it won't make the rdp to report a QS too early, since rnp->qsmask isn't intialized anyway, but note_gp_changes() would needlessly lock the rnp lock to record the state change in rdp->gp_seq. This is why we need an intermediate state called RCU_SEQ_STARTED during which note_gp_changes() can safely ignore the state change. Then once the root's qsmask is initialized, the state can switch to RCU_SEQ_WAIT_QS, after which calling note_gp_changes() becomes useful. Hope that helped. Thanks.