On Thu, Oct 27, 2016 at 10:36:00PM +0200, Thomas Gleixner wrote: > So that waiter which is now spinning on pi_mutex->lock has already set the > waiters bit, which you undo. So you created the following scenario: > > CPU0 CPU1 CPU2 > > TID 0x1001 TID 0x1000 TID 0x1002 > > Acquires futex in user space > futex value = 0x1000; > > futex_lock_pi() > > lock_hb() > set_waiter_bit() > --> futex value = 0x40001000; > > futex_unlock_pi() > lock_hb() > attach_to_pi_owner() > rt_mutex_init_proxy_locked() > queue_me() > unlock_hb() > unlock_hb() > rt_mutex_lock() wake_futex_pi() > lock(pi_mutex->lock); > lock(pi_mutex->lock) new_owner is NULL; > --> futex value = 0; > rt_mutex_futex_unlock(pi_mutex); > unlock(pi_mutex->lock); > acquire_rt_mutex() return to user space > Acquires futex in user > space > --> futex value = 0x1002 > fixup_owner() > fixup_pi_state_owner() > uval = 0x1002; > newval = 0x40001001; > cmpxchg(uval, newval) succeeds > --> futex value = 0x40001001 > > Voila. Inconsistent state .... TID 0x1002 is borked now.
OK, I think its much worse... Consider: CPU0 (tid=0x1000) CPU1 (tid=0x1001) CPU2 (tid=0x1002) acquire in userspace *uaddr = 0x1000 futex_lock_pi() acq hb->lock attach_to_pi_owner pi_state->refcount == 1 queue_me rel hb->lock futex_lock_pi() acq hb->lock attach_to_pi_state pi_state->refcount == 2 queue_me rel hb->lock futex_unlock_pi() acq pi_mutex->lock top_waiter == (CPU1 | CPU2) wake_futex_pi() new_owner == NULL pi_state->owner = &init_task *uaddr = 0 rt_mutex_futex_unlock() ret-to-user acquire in userspace *uaddr = 0x1000 >From here there's multiple ways to trainwreck the thing, the way you list above, lets call this A): rt_mutex_lock() acq hb->lock fixup_owner() cmpxchg(uaddr, 0x1000, 0x40001002) *BORKED CPU0* alternatively we can do; B): CPU3 (or CPU0 with a different tid) futex_lock_pi() acq hb->lock attach_to_pi_state() -EINVAL : *uaddr != task_pid_vnr(&init_task) Now, since CPU1 is pinning the hb->waiter relation, the proposed fixup_owner() -EAGAIN change cannot work either, since, A1): rt_mutex_lock() acq hb->lock fixup_owner() : -EAGAIN unqueue_me_pi() put_pi_state pi_state->refcount == 1 rel hb->lock goto retry_private retry_private: acq hb->lock attach_to_pi_state() : -EINVAL Similar things happen with CMP_REQUEUE doing a lookup_pi_state() around this point. I'm sorely tempted to do the 'simple' thing and leak FUTEX_WAITERS, forcing things into the slow path.