On Fri, 2014-01-31 at 21:08 +0100, Peter Zijlstra wrote: > On Fri, Jan 31, 2014 at 12:01:37PM -0800, Jason Low wrote: > > Currently still getting soft lockups with the updated version. > > Bugger.. ok clearly I need to think harder still. I'm fairly sure this > cancelation can work though, just seems tricky to get right :-)
Ok, I believe I have found a race condition between m_spin_lock() and m_spin_unlock(). In m_spin_unlock(), we do "next = ACCESS_ONCE(node->next)". Then, if next is not NULL, we proceed to set next->locked to 1. A thread in m_spin_lock() in the unqueue path could execute "next = cmpxchg(&prev->next, node, NULL)" after the thread in m_spin_unlock() accesses its node->next and finds that it is not NULL. Then, the thread in m_spin_lock() could check !node->locked before the thread in m_spin_unlock() sets next->locked to 1. The following addition change was able to solve the initial lockups that were occurring when running fserver on a 2 socket box. --- diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 9eb4dbe..e71a84a 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -513,8 +513,13 @@ static void m_spin_unlock(struct m_spinlock **lock) return; next = ACCESS_ONCE(node->next); - if (unlikely(next)) - break; + + if (unlikely(next)) { + next = cmpxchg(&node->next, next, NULL); + + if (next) + break; + } arch_mutex_cpu_relax(); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/