On Mon, Feb 13, 2017 at 10:24:38AM +0800, panxinhui wrote:
> 
> 
> 在 2017/2/10 上午4:53, Waiman Long 写道:
> > On 02/07/2017 10:39 PM, Xinhui Pan wrote:
> >>
> >>
> >> 2016-12-26 4:26 GMT+08:00 Waiman Long <long...@redhat.com 
> >> <mailto:long...@redhat.com>>:
> >>
> >>     A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
> >>     relaxed versions to improve performance on architectures that use 
> >> LL/SC.
> >>
> >>     All the locking related cmpxchg's are replaced with the _acquire
> >>     variants:
> >>      - pv_queued_spin_steal_lock()
> >>      - trylock_clear_pending()
> >>
> >>     The cmpxchg's related to hashing are replaced by either by the _release
> >>     or the _relaxed variants. See the inline comment for details.
> >>
> >>     Signed-off-by: Waiman Long <long...@redhat.com 
> >> <mailto:long...@redhat.com>>
> >>
> >>      v1->v2:
> >>       - Add comments in changelog and code for the rationale of the change.
> >>
> >>     ---
> >>      kernel/locking/qspinlock_paravirt.h | 50 
> >> ++++++++++++++++++++++++-------------
> >>      1 file changed, 33 insertions(+), 17 deletions(-)
> >>
> >>
> >>     @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock 
> >> *node, struct mcs_spinlock *prev)
> >>                      * If pv_kick_node() changed us to vcpu_hashed, retain 
> >> that
> >>                      * value so that pv_wait_head_or_lock() knows to not 
> >> also try
> >>                      * to hash this lock.
> >>     +                *
> >>     +                * The smp_store_mb() and control dependency above 
> >> will ensure
> >>     +                * that state change won't happen before that. 
> >> Synchronizing
> >>     +                * with pv_kick_node() wrt hashing by this waiter or 
> >> by the
> >>     +                * lock holder is done solely by the state variable. 
> >> There is
> >>     +                * no other ordering requirement.
> >>                      */
> >>     -               cmpxchg(&pn->state, vcpu_halted, vcpu_running);
> >>     +               cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running);
> >>
> >>                     /*
> >>                      * If the locked flag is still not set after wakeup, 
> >> it is a
> >>     @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock, 
> >> struct mcs_spinlock *node)
> >>              * pv_wait_node(). If OTOH this fails, the vCPU was running 
> >> and will
> >>              * observe its next->locked value and advance itself.
> >>              *
> >>     -        * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
> >>     +        * Matches with smp_store_mb() and cmpxchg_relaxed() in 
> >> pv_wait_node().
> >>     +        * A release barrier is used here to ensure that node->locked 
> >> is
> >>     +        * always set before changing the state. See comment in 
> >> pv_wait_node().
> >>              */
> >>     -       if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != 
> >> vcpu_halted)
> >>     +       if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
> >>     +                       != vcpu_halted)
> >>                     return;
> >>
> >> hi, Waiman
> >> We can't use _release here, a full barrier is needed.
> >>
> >> There is pv_kick_node vs pv_wait_head_or_lock
> >>
> >> [w] l->locked = _Q_SLOW_VAL  //reordered here    
> >>                                                                            
> >>      if (READ_ONCE(pn->state) == vcpu_hashed) //False.
> >>                                                                            
> >>                     lp = (struct qspinlock **)1;
> >>
> >> [STORE] pn->state = vcpu_hashed                        lp = pv_hash(lock, 
> >> pn);
> >> pv_hash()                                                                
> >> if (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed.
> >>
> >> Then the same lock has hashed twice but only unhashed once. So at last as 
> >> the hash table grows big, we hit RCU stall.
> >>
> >> I hit RCU stall when I run netperf benchmark
> >>
> >> thanks
> >> xinhui
> >>
> >>
> >>     --
> >>     1.8.3.1
> >>
> >>
> > Yes, I know I am being too aggressive in this patch. I am going to tone it 
> > down a bit. I just don't have time to run a performance test on PPC system 
> > to verify the gain yet. I am planning to send an updated patch soon.
> > 
> hi, All
> 
> I guess I have found the scenario that causes the RCU stall.
> 

Yes, I believe that's the case, as the comment before smp_store_mb() in
pv_wait_node states. Good show!

Regards,
Boqun

> pv_wait_node                                  
>                                                                               
> [L] pn->state // this load is reordered from cmxchg_release.
>       smp_store_mb(pn->state, vcpu_halted);                                   
>       if (!READ_ONCE(node->locked))                                           
>                                                                               
> arch_mcs_spin_unlock_contended(&next->locked);
>                                                                               
> pv_kick_node
>                                                                               
>         [-L]cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
>                                                                               
>         //cmpxchg_release fails, so pn->state keep as it is.
>               pv_wait(&pn->state, vcpu_halted);
>               //on PPC, It will not return until pn->state != vcpu_halted.
>               
> And when rcu stall hit, I fire an BUG(), and enter debug mode, it seems most 
> cpus are in pv_wait...
> 
> So the soltuion to solve this problems is simple, keep the cmpxchg as it is 
> in pv_kick_node, cmpxchg on ppc provides full barriers.
> 
> thanks
> xinhui
> 
> > Cheers,
> > Longman
> > 
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to