On Tue, Apr 8, 2014 at 2:02 PM, Jan Stancek <jstan...@redhat.com> wrote: > > I ran reproducer with following change on s390x system, where this > can be reproduced usually within seconds: > > diff --git a/kernel/futex.c b/kernel/futex.c > index 67dacaf..9150ffd 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -1095,6 +1095,7 @@ static int unlock_futex_pi(u32 __user *uaddr, u32 uval) > static inline void > double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2) > { > + hb_waiters_inc(hb2); > if (hb1 <= hb2) { > spin_lock(&hb1->lock); > if (hb1 < hb2) > @@ -1111,6 +1112,7 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct > futex_hash_bucket *hb2) > spin_unlock(&hb1->lock); > if (hb1 != hb2) > spin_unlock(&hb2->lock); > + hb_waiters_dec(hb2); > } > > /* > > Reproducer is running without failures over an hour now and > made ~1.4 million iterations.
Ok, that's encouraging. That is the smallest patch I could come up with, but as mentioned, it's not optimal. We only need it for futex_requeue(), but if we do it there we'd have to handle all the different error cases (there's only one call to double_lock_hb(), but due to the error cases there's four calls to double_unlock_hb(). I'm not sure how much we care. The simple patch basically adds two (unnecessary) atomics to the futex_wake_op() path. I don't know how critical that path is - not as critical as the regular "futex_wake()", I'd expect, but I guess pthread_cond_signal() is the main user. So I'll have to leave this decision to the futex people. But the attached slightly more complex patch *may* be the better one. May I bother you to test this one too? I really think that futex_requeue() is the only user that should need this, so doing it there rather than in double_[un]lock_hb() should be slightly more optimal, but who knows what I've missed. We clearly *all* missed this race back when the ordering rules were documented.. Still hoping for comments from PeterZ and Davidlohr. Linus
kernel/futex.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/futex.c b/kernel/futex.c index 67dacaf93e56..6801b3751a95 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1452,6 +1452,7 @@ retry: hb2 = hash_futex(&key2); retry_private: + hb_waiters_inc(hb2); double_lock_hb(hb1, hb2); if (likely(cmpval != NULL)) { @@ -1461,6 +1462,7 @@ retry_private: if (unlikely(ret)) { double_unlock_hb(hb1, hb2); + hb_waiters_dec(hb2); ret = get_user(curval, uaddr1); if (ret) @@ -1510,6 +1512,7 @@ retry_private: break; case -EFAULT: double_unlock_hb(hb1, hb2); + hb_waiters_dec(hb2); put_futex_key(&key2); put_futex_key(&key1); ret = fault_in_user_writeable(uaddr2); @@ -1519,6 +1522,7 @@ retry_private: case -EAGAIN: /* The owner was exiting, try again. */ double_unlock_hb(hb1, hb2); + hb_waiters_dec(hb2); put_futex_key(&key2); put_futex_key(&key1); cond_resched(); @@ -1594,6 +1598,7 @@ retry_private: out_unlock: double_unlock_hb(hb1, hb2); + hb_waiters_dec(hb2); /* * drop_futex_key_refs() must be called outside the spinlocks. During