Hi Frederic, On 4/2/2025 10:22 AM, Frederic Weisbecker wrote:
>> On Tue, 1 Apr 2025 16:27:36 GMT, Frederic Weisbecker wrote: >>> Le Mon, Mar 31, 2025 at 02:29:52PM -0700, Paul E. McKenney a écrit : >>>> The disagreement is a feature, at least up to a point. That feature >>>> allows CPUs to go idle for long periods without RCU having to bother >>>> them or to mess with their per-CPU data (give or take ->gpwrap). It also >>>> allows per-rcu_node-leaf locking, which is important on large systems. >>>> >>>> Trying to make precisely globally agreed-on beginnings and ends of >>>> RCU grace periods will not end well from performance, scalability, >>>> or real-time-response viewpoints. ;-) >>> >>> The distributed disagreement is definetly a feature. The duplicate root >>> is more debatable. >>> >>>> But simplifications that don't hurt performance, scalability, and >>>> real-time-response are of course welcome. >>> >>> I'm not even sure my proposal is a simplification. Perhaps it is. Another >>> hope is that it could avoid future accidents. >>> >> >> Aside from the performance concerns: >> >> Sorry if this is silly but could you provide a small hint as to how >> unifying the global counter with the node affects QS reporting or hotplug? >> It was not immediately obvious to me. Thanks for the help. > > First of all rcu_seq_start() must be before the hotplug scan, otherwise > you run into this: > > rcu_state.gp_seq = 4 > > > CPU0/ rcu_gp_kthread() CPU 1 > CPU 2 > ------------- ---------- > ----------- > //rcu_gp_init() > rcu_for_each_leaf_node(rnp) { > raw_spin_lock_rcu_node(rnp); > rnp->qsmaskinit = rnp->qsmaskinitnext > raw_spin_unlock_rcu_node(rnp); > } > > rcutree_report_cpu_starting() > > raw_spin_lock_rcu_node(rnp); > rnp->qsmaskinitnext |= > rdp->grpmask > > raw_spin_unlock_rcu_node(rnp); > > rcu_read_lock() > r0 = *X > > r1 = *X > > X = NULL > > cookie = get_state_sychronize_rcu() > > //cookie = 8 > rcu_seq_start(&rcu_state.gp_seq) > //rcu_state.gp_seq == 5 > > rcu_for_each_node_breadth_first(rnp) { > raw_spin_lock_rcu_node(rnp); > // Ignore CPU 1 > rnp->qsmask = rnp->qsmaskinit; > raw_spin_unlock_rcu_node(rnp); > } > [...] > > //rcu_gp_cleanup() > rcu_seq_end(&rcu_state.gp_seq) > // rcu_state.gp_seq == 8 > > poll_state_sychronize_rcu(cookie) > > kfree(r1) > r2 = *r0 // CRASH > > > > Makes sense! It will be too short GP if we did seq_start but missed the incoming CPU which happened to start a reader before we did the seq_start. > So the same applies if we convert rcu_state to use the root node. > But if we do rcu_seq_start() on the root node, then an update side > can call note_gp_changes() because of the state change (only if the > root node is also the unique leaf). But this then happens before > the loop that initializes all the ->qsmask > > It's not a correctness problem because it won't make the rdp to > report a QS too early, since rnp->qsmask isn't intialized anyway, > but note_gp_changes() would needlessly lock the rnp lock to record > the state change in rdp->gp_seq. > > This is why we need an intermediate state called RCU_SEQ_STARTED > during which note_gp_changes() can safely ignore the state change. Ah ok, so it is not a correctness problem.. that's what I was wondering. It is more of a performance optimization it sounds like. > > Then once the root's qsmask is initialized, the state can switch > to RCU_SEQ_WAIT_QS, after which calling note_gp_changes() becomes > useful. I agree, it is helpful to track the initialization through states and check for that in code paths. > Hope that helped. It does, thanks! - Joel