Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()

Joel Fernandes Mon, 03 Mar 2025 09:17:30 -0800

On Sun, Mar 02, 2025 at 07:17:10PM -0500, Joel Fernandes wrote:
> On Sun, Mar 02, 2025 at 07:15:07PM -0500, Joel Fernandes wrote:
> > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > > Hello, Paul!
> > > > > 
> > > > > > > > > > 
> > > > > > > > > > Except that I got this from overnight testing of rcu/dev on 
> > > > > > > > > > the shared
> > > > > > > > > > RCU tree:
> > > > > > > > > > 
> > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 
> > > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > > > 
> > > > > > > > > > I see this only on TREE05.  Which should not be too 
> > > > > > > > > > surprising, given
> > > > > > > > > > that this is the scenario that tests it.  It happened 
> > > > > > > > > > within five minutes
> > > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > > > 
> > > > > > > > > Hm.. This is not fun. I tested this on my system and i did 
> > > > > > > > > not manage to
> > > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > > > 
> > > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > > > 
> > > > > > > I can trigger it. But.
> > > > > > > 
> > > > > > > Some background. I tested those patches during many hours on the 
> > > > > > > stable
> > > > > > > kernel which is 6.13. On that kernel i was not able to trigger 
> > > > > > > it. Running
> > > > > > > the rcutorture on the our shared "dev" tree, which i did now, 
> > > > > > > triggers this
> > > > > > > right away.
> > > > > > 
> > > > > > Bisection?  (Hey, you knew that was coming!)
> > > > > > 
> > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start 
> > > > > detection
> > > > > 
> > > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > > > 
> > > > Huh.  We sure don't get to revert that one...
> > > > 
> > > > Do we have a problem with the ordering in rcu_gp_init() between the 
> > > > calls
> > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()?  For 
> > > > example,
> > > > do we need to capture the relevant portion of the list before the call
> > > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> > > 
> > > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > > Which does not necessarily mean that this is the correct fix, but I
> > > figured that it might at least provide food for thought.
> > > 
> > >                                           Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 48384fa2eaeb8..d3efeff7740e7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> > >  
> > >        /* Advance to a new grace period and initialize state. */
> > >        record_gp_stall_check_time();
> > > +      start_new_poll = rcu_sr_normal_gp_init();
> > >        /* Record GP times before starting GP, hence rcu_seq_start(). */
> > >        rcu_seq_start(&rcu_state.gp_seq);
> > >        ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > -      start_new_poll = rcu_sr_normal_gp_init();
> > >        trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, 
> > > TPS("start"));
> > 
> > Oh... so the bug is this? Good catch...
> > 
> > 
> > CPU 0                                           CPU 1
> > 
> >                                                 rcu_gp_init()
> >                                                       
> > rcu_seq_start(rcu_state.gp_seq)
> > sychronize_rcu_normal()
> >       rs.head.func
> >             = (void *) get_state_synchronize_rcu();
> >                    // save rcu_state.gp_seq
> >       rcu_sr_normal_add_req() ->
> >             llist_add(rcu_state.srs_next)
> >       (void) start_poll_synchronize_rcu();
> > 
> > 
> >                                                       sr_normal_gp_init()
> >                                                             
> > llist_add(wait_head, &rcu_state.srs_next);
> >                                                         // pick up the
> >                                                         // injected WH
> >                                                             
> > rcu_state.srs_wait_tail = wait_head;
> > 
> >                                                 rcu_gp_cleanup()
> >                                                       
> > rcu_seq_end(&rcu_state.gp_seq);
> >                                                       sr_normal_complete()
> >                                                             
> > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
> >                                                             
> > !poll_state_synchronize_rcu(oldstate),
> > 
> > Where as reordering sr_normal_gp_init() prevents this:
> > 
> >                                                 rcu_gp_init()
> > 
> >                                                       sr_normal_gp_init()
> >                                                         // WH has not
> >                                                         // been injected
> >                                                         // so nothing to
> >                                                         // wait on
> > 
> >                                                       
> > rcu_seq_start(rcu_state.gp_seq)
> > sychronize_rcu_normal()
> >       rs.head.func
> >             = (void *) get_state_synchronize_rcu();
> >                    // save rcu_state.gp_seq
> >       rcu_sr_normal_add_req() ->
> >             llist_add(rcu_state.srs_next)
> >       (void) start_poll_synchronize_rcu();
> > 
> >                                                 rcu_gp_cleanup()
> >                                                       
> > rcu_seq_end(&rcu_state.gp_seq);
> >                                                       // 
> > sr_normal_complete()
> >                                                   // wont do anything so
> >                                                   // no warning
> > 
> > Did I get that right?
> > 
> > I think this is a real bug AFAICS, hoping all the memory barriers are in
> > place to make sure the code reordering also correctly orders the accesses.
> > I'll double check that.
> > 
> > I also feel its 'theoretical', because as long as rcu_gp_init() and
> > rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then
> > synchronize_rcu_normal() still waits for pre-existing readers even though 
> > its
> > a bit confused about the value of the cookies.
> > 
> > For the fix,
> > Reviewed-by: Joel Fernandes (Google) <j...@joelfernandes.org>
> 
> Oops, this should be:
> 
> Reviewed-by: Joel Fernandes <joelagn...@nvidia.com>


I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start
detection" is not yet on -next. Once we are convinced about the fix, do we
want to squash the fix into this patch and have Boqun take it?

Yet another idea is to apply it for 6.15-rc cycle if more time is needed.

Alternatively, we could squash it and I queue it for 6.16 instead of 6.15.

And I'm guessing Vlad's series is also for 6.16.

thanks,

 - Joel

Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()

Reply via email to