Hey Thomas, Peter- Gratian and I have been debugging into a nasty and difficult race w/ futexes seemingly the culprit. The original symptom we were seeing was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation.
On further analysis, however, it appears the thread which gets the spurious -EDEADLK has observed a weird futex state: a prior futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2 futex word owner field indicates that it's the owner. Here's an attempt to boil down this situation into a pseudo trace; I'm happy to forward along the full traces as well, if that would be helpful: waiter waker stealer (prio > waiter) futex(WAIT_REQUEUE_PI, uaddr, uaddr2, timeout=[N ms]) futex_wait_requeue_pi() futex_wait_queue_me() freezable_schedule() <scheduled out> futex(LOCK_PI, uaddr2) futex(CMP_REQUEUE_PI, uaddr, uaddr2, 1, 0) /* requeues waiter to uaddr2 */ futex(UNLOCK_PI, uaddr2) wake_futex_pi() cmp_futex_value_locked(uaddr, waiter) wake_up_q() <woken by waker> <hrtimer_wakeup() fires, clears sleeper->task> futex(LOCK_PI, uaddr2) __rt_mutex_start_proxy_lock() try_to_take_rt_mutex() /* steals lock */ rt_mutex_set_owner(lock, stealer) <preempted> <scheduled in> rt_mutex_wait_proxy_lock() __rt_mutex_slowlock() try_to_take_rt_mutex() /* fails, lock held by stealer */ if (timeout && !timeout->task) return -ETIMEDOUT; fixup_owner() /* lock wasn't acquired, so, fixup_pi_state_owner skipped */ return -ETIMEDOUT; /* At this point, we've returned -ETIMEDOUT to userspace, but the * futex word shows waiter to be the owner, and the pi_mutex has * stealer as the owner */ futex_lock(LOCK_PI, uaddr2) -> bails with EDEADLK, futex word says we're owner. At some later point in execution, the stealer gets scheduled back in and will do fixup_owner() which fixes up the futex word, but at that point it's too late: the waiter has already observed the wonky state. fixup_owner() used to have additional seemingly relevant checks in place that were removed 73d786bd043eb ("futex: Rework inconsistent rt_mutex/futex_q state"). The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3 ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races") cherry-picked w/ PREEMPT_RT_FULL. However, it appears that this issue may affect v4.15-rc1? Thoughts on how to move forward? Nasty. Julia