On 07/14/2016 07:39 AM, Wanpeng Li wrote:
From: Wanpeng Li<[email protected]>

When the lock holder vCPU is racing with the queue head:

    CPU 0 (lock holder)    CPU 1 (queue head)
    ===================    =================
    spin_lock();           spin_lock();
     pv_kick_node():        pv_wait_head_or_lock():
                             if (!lp) {
                              lp = pv_hash(lock, pn);
                              xchg(&l->locked, _Q_SLOW_VAL);
                             }
                             WRITE_ONCE(pn->state, vcpu_halted);
      cmpxchg(&pn->state,
       vcpu_halted, vcpu_hashed);
      WRITE_ONCE(l->locked, _Q_SLOW_VAL);
      (void)pv_hash(lock, pn);

In this case, lock holder inserts the pv_node of queue head into the
hash table and set _Q_SLOW_VAL which can result in hash entry leak.
This patch avoids it by restoring/setting vcpu_hashed state after
failing adaptive locking spinning.

Reviewed-by: Pan Xinhui<[email protected]>
Cc: Peter Zijlstra (Intel)<[email protected]>
Cc: Ingo Molnar<[email protected]>
Cc: Waiman Long<[email protected]>
Cc: Davidlohr Bueso<[email protected]>
Signed-off-by: Wanpeng Li<[email protected]>
---
v2 ->  v3:
  * fix typo in patch description
v1 ->  v2:
  * adjust patch description

  kernel/locking/qspinlock_paravirt.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 21ede57..ac7d20b 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -450,7 +450,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct 
mcs_spinlock *node)
                                goto gotlock;
                        }
                }
-               WRITE_ONCE(pn->state, vcpu_halted);
+               WRITE_ONCE(pn->state, vcpu_hashed);
                qstat_inc(qstat_pv_wait_head, true);
                qstat_inc(qstat_pv_wait_again, waitcnt);
                pv_wait(&l->locked, _Q_SLOW_VAL);

As pv_kick_node() is called immediately after designating the next node as the queue head, the chance of this racing is possible, but is not likely unless the lock holder vCPU gets preempted for a long time at that right moment. This change does not do any harm though, so I am OK with that. However, I do want you to add a comment about the possible race in the code as it isn't that obvious or likely.

Cheers,
Longman

Reply via email to