On Thu, Oct 27, 2016 at 10:36:00PM +0200, Thomas Gleixner wrote:
> So that waiter which is now spinning on pi_mutex->lock has already set the
> waiters bit, which you undo. So you created the following scenario:
> 
> CPU0                    CPU1                          CPU2
> 
> TID 0x1001              TID 0x1000                    TID 0x1002
> 
>                         Acquires futex in user space
>                         futex value = 0x1000;
> 
> futex_lock_pi()
>   
>   lock_hb()
>   set_waiter_bit()
>   --> futex value = 0x40001000;
> 
>                         futex_unlock_pi()
>                          lock_hb()
>   attach_to_pi_owner()
>     rt_mutex_init_proxy_locked()
>   queue_me()
>     unlock_hb()
>                          unlock_hb()
>   rt_mutex_lock()        wake_futex_pi()
>                          lock(pi_mutex->lock);
>    lock(pi_mutex->lock)          new_owner is NULL;
>                           --> futex value = 0;
>                          rt_mutex_futex_unlock(pi_mutex);
>                          unlock(pi_mutex->lock);
>    acquire_rt_mutex()    return to user space
>                                                       Acquires futex in user 
> space
>                                                       --> futex value = 0x1002
>   fixup_owner()
>     fixup_pi_state_owner()
>        uval = 0x1002;
>        newval = 0x40001001;
>        cmpxchg(uval, newval) succeeds
>        --> futex value = 0x40001001
> 
> Voila. Inconsistent state .... TID 0x1002 is borked now.
OK, I think its much worse...


Consider:


CPU0 (tid=0x1000)               CPU1 (tid=0x1001)               CPU2 
(tid=0x1002)


acquire in userspace
  *uaddr = 0x1000

                                futex_lock_pi()
                                  acq hb->lock
                                  attach_to_pi_owner
                                    pi_state->refcount == 1
                                  queue_me
                                    rel hb->lock
                                                                futex_lock_pi()
                                                                  acq hb->lock
                                                                  
attach_to_pi_state
                                                                    
pi_state->refcount == 2
                                                                  queue_me
                                                                    rel hb->lock

futex_unlock_pi()
  acq pi_mutex->lock
  top_waiter == (CPU1 | CPU2)
  wake_futex_pi()
    new_owner == NULL
      pi_state->owner = &init_task
      *uaddr = 0
    rt_mutex_futex_unlock()
  ret-to-user


acquire in userspace
  *uaddr = 0x1000



>From here there's multiple ways to trainwreck the thing, the way you
list above, lets call this A):

                                                                  
rt_mutex_lock()
                                                                  acq hb->lock
                                                                  fixup_owner()
                                                                    
cmpxchg(uaddr, 0x1000, 0x40001002)

                                                                    *BORKED 
CPU0*

alternatively we can do; B):

                        CPU3 (or CPU0 with a different tid)

                        futex_lock_pi()
                          acq hb->lock
                          attach_to_pi_state()
                            -EINVAL : *uaddr != task_pid_vnr(&init_task)



Now, since CPU1 is pinning the hb->waiter relation, the proposed
fixup_owner() -EAGAIN change cannot work either, since, A1):

                                                                  
rt_mutex_lock()
                                                                  acq hb->lock
                                                                  fixup_owner() 
: -EAGAIN
                                                                  
unqueue_me_pi()
                                                                    put_pi_state
                                                                      
pi_state->refcount == 1
                                                                    rel hb->lock
                                                                  goto 
retry_private

                                                                retry_private:
                                                                  acq hb->lock
                                                                  
attach_to_pi_state() : -EINVAL


Similar things happen with CMP_REQUEUE doing a lookup_pi_state() around
this point.

I'm sorely tempted to do the 'simple' thing and leak FUTEX_WAITERS,
forcing things into the slow path.


Reply via email to