On Thu, Oct 08, 2015 at 11:49:33AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 09:48:58AM -0700, Paul E. McKenney wrote:
> 
> > > Some implementation choice requires this barrier upgrade -- and in
> > > another email I suggest its the whole tree thing, we need to firmly
> > > establish the state of one level before propagating the state up etc.
> > > 
> > > Now I'm not entirely sure this is fully correct, but its the best I
> > > could come up.
> > 
> > It is pretty close.  Ignoring dyntick idle for the moment, things
> > go (very) roughly like this:
> > 
> > o   The RCU grace-period kthread notices that a new grace period
> >     is needed.  It initializes the tree, which includes acquiring
> >     every rcu_node structure's ->lock.
> > 
> > o   CPU A notices that there is a new grace period.  It acquires
> >     the ->lock of its leaf rcu_node structure, which forces full
> >     ordering against the grace-period kthread.
> 
> If the kthread took _all_ rcu_node locks, then this does not require the
> barrier upgrade because they will share a lock variable.
> 
> > o   Some time later, that CPU A realizes that it has passed
> >     through a quiescent state, and again acquires its leaf rcu_node
> >     structure's ->lock, again enforcing full ordering, but this
> >     time against all CPUs corresponding to this same leaf rcu_node
> >     structure that previously noticed quiescent states for this
> >     same grace period.  Also against all prior readers on this
> >     same CPU.
> 
> This again reads like the same lock variable is involved, and therefore
> the barrier upgrade is not required for this.
> 
> > o   Some time later, CPU B (corresponding to that same leaf
> >     rcu_node structure) is the last of that leaf's group of CPUs
> >     to notice a quiescent state.  It has also acquired that leaf's
> >     ->lock, again forcing ordering against its prior RCU read-side
> >     critical sections, but also against all the prior RCU
> >     read-side critical sections of all other CPUs corresponding
> >     to this same leaf.
> 
> same lock var again..
> 
> > o   CPU B therefore moves up the tree, acquiring the parent
> >     rcu_node structures' ->lock.  In so doing, it forces full
> >     ordering against all prior RCU read-side critical sections
> >     of all CPUs corresponding to all leaf rcu_node structures
> >     subordinate to the current (non-leaf) rcu_node structure.
> 
> And here we iterate the tree and get another lock var involved, here the
> barrier upgrade will actually do something.

Yep.  And I am way too lazy to sort out exactly which acquisitions really
truly need smp_mb__after_unlock_lock() and which don't.  Besides, if I
tried to sort it out, I would occasionally get it wrong, and this would be
a real pain to debug.  Therefore, I simply do smp_mb__after_unlock_lock()
on all acquisitions of the rcu_node structures' ->lock fields.  I can
actually validate that!  ;-)

> > o   And so on, up the tree.
> 
> idem..
> 
> > o   When CPU C reaches the root of the tree, and realizes that
> >     it is the last CPU to report a quiescent state for the
> >     current grace period, its acquisition of the root rcu_node
> >     structure's ->lock has forced full ordering against all
> >     RCU read-side critical sections that started before this
> >     grace period -- on all CPUs.
> 
> Right, which makes the full barrier transitivity thing important
> 
> >     CPU C therefore awakens the grace-period kthread.
> 
> > o   When the grace-period kthread wakes up, it does cleanup,
> >     which (you guessed it!) requires acquiring the ->lock of
> >     each rcu_node structure.  This not only forces full ordering
> >     against each pre-existing RCU read-side critical section,
> >     it also sets up things so that...
> 
> Again, if it takes _all_ rcu_nodes, it also shares a lock variable and
> hence the upgrade is not required.
> 
> > o   When CPU D notices that the grace period ended, it does so
> >     while holding its leaf rcu_node structure's ->lock.  This
> >     forces full ordering against all relevant RCU read-side
> >     critical sections.  This ordering prevails when CPU D later
> >     starts invoking RCU callbacks.
> 
> Does also not seem to require the upgrade..
> 
> > Hey, you asked!!!  ;-)
> 
> No, I asked what all the barrier upgrade was for, most of the above does
> not seem to rely on that at all.
> 
> The only place this upgrade matters is the UNLOCK x + LOCK y scenario,
> as also per the comment above smp_mb__after_unlock_lock().
> 
> Any other ordering is not on this but on the other primitives and
> irrelevant to the barrier upgrade.

I am still keeping an smp_mb__after_unlock_lock() after every ->lock.
Trying to track which needs it and which does not is asking for
subtle bugs.

> > Again, this is a cartoon-like view of the ordering that leaves out a
> > lot of details, but it should get across the gist of the ordering.
> 
> So the ordering I'm interested in, is the bit that is provided by the
> barrier upgrade, and that seems very limited and directly pertains to
> the tree iteration, ensuring its fully separated and transitive.
> 
> So I'll stick to explanation that the barrier upgrade is purely for the
> tree iteration, to separate and make transitive the tree level state.

Fair enough, but I will be sticking to the simple coding rule that keeps
RCU out of trouble!

                                                        Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to