On Mon, Feb 13, 2017 at 10:24:38AM +0800, panxinhui wrote: > > > 在 2017/2/10 上午4:53, Waiman Long 写道: > > On 02/07/2017 10:39 PM, Xinhui Pan wrote: > >> > >> > >> 2016-12-26 4:26 GMT+08:00 Waiman Long <long...@redhat.com > >> <mailto:long...@redhat.com>>: > >> > >> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more > >> relaxed versions to improve performance on architectures that use > >> LL/SC. > >> > >> All the locking related cmpxchg's are replaced with the _acquire > >> variants: > >> - pv_queued_spin_steal_lock() > >> - trylock_clear_pending() > >> > >> The cmpxchg's related to hashing are replaced by either by the _release > >> or the _relaxed variants. See the inline comment for details. > >> > >> Signed-off-by: Waiman Long <long...@redhat.com > >> <mailto:long...@redhat.com>> > >> > >> v1->v2: > >> - Add comments in changelog and code for the rationale of the change. > >> > >> --- > >> kernel/locking/qspinlock_paravirt.h | 50 > >> ++++++++++++++++++++++++------------- > >> 1 file changed, 33 insertions(+), 17 deletions(-) > >> > >> > >> @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock > >> *node, struct mcs_spinlock *prev) > >> * If pv_kick_node() changed us to vcpu_hashed, retain > >> that > >> * value so that pv_wait_head_or_lock() knows to not > >> also try > >> * to hash this lock. > >> + * > >> + * The smp_store_mb() and control dependency above > >> will ensure > >> + * that state change won't happen before that. > >> Synchronizing > >> + * with pv_kick_node() wrt hashing by this waiter or > >> by the > >> + * lock holder is done solely by the state variable. > >> There is > >> + * no other ordering requirement. > >> */ > >> - cmpxchg(&pn->state, vcpu_halted, vcpu_running); > >> + cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running); > >> > >> /* > >> * If the locked flag is still not set after wakeup, > >> it is a > >> @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock, > >> struct mcs_spinlock *node) > >> * pv_wait_node(). If OTOH this fails, the vCPU was running > >> and will > >> * observe its next->locked value and advance itself. > >> * > >> - * Matches with smp_store_mb() and cmpxchg() in pv_wait_node() > >> + * Matches with smp_store_mb() and cmpxchg_relaxed() in > >> pv_wait_node(). > >> + * A release barrier is used here to ensure that node->locked > >> is > >> + * always set before changing the state. See comment in > >> pv_wait_node(). > >> */ > >> - if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != > >> vcpu_halted) > >> + if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed) > >> + != vcpu_halted) > >> return; > >> > >> hi, Waiman > >> We can't use _release here, a full barrier is needed. > >> > >> There is pv_kick_node vs pv_wait_head_or_lock > >> > >> [w] l->locked = _Q_SLOW_VAL //reordered here > >> > >> if (READ_ONCE(pn->state) == vcpu_hashed) //False. > >> > >> lp = (struct qspinlock **)1; > >> > >> [STORE] pn->state = vcpu_hashed lp = pv_hash(lock, > >> pn); > >> pv_hash() > >> if (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed. > >> > >> Then the same lock has hashed twice but only unhashed once. So at last as > >> the hash table grows big, we hit RCU stall. > >> > >> I hit RCU stall when I run netperf benchmark > >> > >> thanks > >> xinhui > >> > >> > >> -- > >> 1.8.3.1 > >> > >> > > Yes, I know I am being too aggressive in this patch. I am going to tone it > > down a bit. I just don't have time to run a performance test on PPC system > > to verify the gain yet. I am planning to send an updated patch soon. > > > hi, All > > I guess I have found the scenario that causes the RCU stall. >
Yes, I believe that's the case, as the comment before smp_store_mb() in pv_wait_node states. Good show! Regards, Boqun > pv_wait_node > > [L] pn->state // this load is reordered from cmxchg_release. > smp_store_mb(pn->state, vcpu_halted); > if (!READ_ONCE(node->locked)) > > arch_mcs_spin_unlock_contended(&next->locked); > > pv_kick_node > > [-L]cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed) > > //cmpxchg_release fails, so pn->state keep as it is. > pv_wait(&pn->state, vcpu_halted); > //on PPC, It will not return until pn->state != vcpu_halted. > > And when rcu stall hit, I fire an BUG(), and enter debug mode, it seems most > cpus are in pv_wait... > > So the soltuion to solve this problems is simple, keep the cmpxchg as it is > in pv_kick_node, cmpxchg on ppc provides full barriers. > > thanks > xinhui > > > Cheers, > > Longman > > >
signature.asc
Description: PGP signature