Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit : > Insomnia kicked in, so 3 am reply here (Zurich local time) ;-): > > On 3/20/2025 3:15 PM, Frederic Weisbecker wrote: > > Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : > >> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: > >>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: > >>>> The numbers used in rcu_seq_done_exact() lack some explanation behind > >>>> their magic. Especially after the commit: > >>>> > >>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start > >>>> detection") > >>>> > >>>> which reported a subtle issue where a new GP sequence snapshot was taken > >>>> on the root node state while a grace period had already been started and > >>>> reflected on the global state sequence but not yet on the root node > >>>> sequence, making a polling user waiting on a wrong already started grace > >>>> period that would ignore freshly online CPUs. > >>>> > >>>> The fix involved taking the snaphot on the global state sequence and > >>>> waiting on the root node sequence. And since a grace period is first > >>>> started on the global state and only afterward reflected on the root > >>>> node, a snapshot taken on the global state sequence might be two full > >>>> grace periods ahead of the root node as in the following example: > >>>> > >>>> rnp->gp_seq = rcu_state.gp_seq = 0 > >>>> > >>>> CPU 0 CPU 1 > >>>> ----- ----- > >>>> // rcu_state.gp_seq = 1 > >>>> rcu_seq_start(&rcu_state.gp_seq) > >>>> // snap = 8 > >>>> snap = > >>>> rcu_seq_snap(&rcu_state.gp_seq) > >>>> // Two full GP > >>>> differences > >>>> > >>>> rcu_seq_done_exact(&rnp->gp_seq, snap) > >>>> // rnp->gp_seq = 1 > >>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); > >>>> > >>>> Add a comment about those expectations and to clarify the magic within > >>>> the relevant function. > >>>> > >>>> Signed-off-by: Frederic Weisbecker <frede...@kernel.org> > >>> Reviewed-by: Paul E. McKenney <paul...@kernel.org> > >>> > >>> But it would of course be good to get reviews from the others. > >> I actually don't agree that the magic in the rcu_seq_done_exact() function > >> about the > >> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, > >> because the small lag can just as well survive with the rcu_seq_done() > >> function in the above sequence right? > >> > >> The rcu_seq_done_exact() function on the other hand is more about not being > >> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need > >> a > >> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be > >> at > >> least ULONG_MAX/2 AFAIU. > >> > >> So the only time this magic will matter is if you have a huge delta between > >> what is being compared, not just 2 GPs. > > You're right, and perhaps I should have made it more specific that my > > comment > > only explains the magic "3" number here, in that if it were "2" instead, > > there > > could be accidents with 2 full GPs difference (which is possible) spuriously > > accounted as a wrap around. > > Ahh, so I guess I get it now and we are both right. The complete picture is - > We > are trying to handle the case of "very large wrap" around but as a part of > that, > we don't want to create false-positives for this "snap" case. > > A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq. > > That's within "2 GPs" worth of counts (about 8 counts) > > Taking some numbers: > > cur_s s delta (s - cur_s) > 0 4 4 > 1 8 7 > 2 8 6 > 3 8 5 > 4 8 4 > 5 12 7 > > The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK > + > 1) which in this case is 7. > > So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 * > RCU_SEQ_STATE_MASK + 1)). i.e.
3, right? > > after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely > conclude that the GP has completed thinking it was due to wrap around, where > as > it is possible we just snapped and got a false positive. > > So I think your comment is mostly correct then. But I think it may be better > to > clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is > because we want handle very large wrap around not being stuck in "false > negative" territory as we would with rcu_seq_done(). But that also means we > can't break the "snap" usecase to the introduction of ULONG_CMP_LT. Indeed. > > Unless you beat me to it, I may modify your patch for v6.16 augmented with > this > reasoning ;) (Also since I am also working on adding that forced wrap around > test to rcutorture). Please do, I appreciate! > > Also it is still not fully clear to me what the root node has to do with all > this in your example, because the rcu_seq_done_exact() needs to be what it is > (that is having that 2 GP adjustment) even if the rnp->gp_seq and > rcu_state.gp_seq were in sync? Yes, this is only to explain that the maximum drift between the snap on rsp and the current state on root rnp can be at most 2 full GP. And that explain the "3" magic in the function. But if they were in sync it's all fine. Thanks. > > thanks, > > - Joel >