On Wed, Nov 09, 2016 at 05:44:02PM -0800, Andy Lutomirski wrote: > On Wed, Nov 9, 2016 at 9:38 AM, Paul E. McKenney > <paul...@linux.vnet.ibm.com> wrote: > > Are you planning on changing rcu_nmi_enter()? It would make it easier > to figure out how they interact if I could see the code.
It already calls rcu_dynticks_eqs_exit(), courtesy of the earlier consolidation patches. > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index dbf20b058f48..342c8ee402d6 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > > > /* > > @@ -305,17 +318,22 @@ static void rcu_dynticks_eqs_enter(void) > > static void rcu_dynticks_eqs_exit(void) > > { > > struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); > > + int seq; > > > > /* > > - * CPUs seeing atomic_inc() must see prior idle sojourns, > > + * CPUs seeing atomic_inc_return() must see prior idle sojourns, > > * and we also must force ordering with the next RCU read-side > > * critical section. > > */ > > - smp_mb__before_atomic(); /* See above. */ > > - atomic_inc(&rdtp->dynticks); > > - smp_mb__after_atomic(); /* See above. */ > > + seq = atomic_inc_return(&rdtp->dynticks); > > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > > - !(atomic_read(&rdtp->dynticks) & 0x1)); > > + !(seq & RCU_DYNTICK_CTRL_CTR)); > > I think there's still a race here. Suppose we're running this code on > cpu n and... > > > + if (seq & RCU_DYNTICK_CTRL_MASK) { > > + rcu_eqs_special_exit(); > > + /* Prefer duplicate flushes to losing a flush. */ > > + smp_mb__before_atomic(); /* NMI safety. */ > > ... another CPU changes the page tables and calls rcu_eqs_special_set(n) here. But then rcu_eqs_special_set() will return false because we already exited the extended quiescent state at the atomic_inc_return() above. That should tell the caller to send an IPI. > That CPU expects that we will flush prior to continuing, but we won't. > Admittedly it's highly unlikely that any stale TLB entries would be > created yet, but nothing rules it out. That said, 0day is having some heartburn from this, so I must have broken something somewhere. My own tests of course complete just fine... > > + atomic_and(~RCU_DYNTICK_CTRL_MASK, &rdtp->dynticks); > > + } > > Maybe the way to handle it is something like: > > this_cpu_write(rcu_nmi_needs_eqs_special, 1); > barrier(); > > /* NMI here will call rcu_eqs_special_exit() regardless of the value > in dynticks */ > > atomic_and(...); > smp_mb__after_atomic(); > rcu_eqs_special_exit(); > > barrier(); > this_cpu_write(rcu_nmi_needs_eqs_special, 0); > > > Then rcu_nmi_enter() would call rcu_eqs_special_exit() if the dynticks > bit is set *or* rcu_nmi_needs_eqs_special is set. > > Does that make sense? I believe that rcu_eqs_special_set() returning false covers this, but could easily be missing something. Thanx, Paul