wait style locks, v3

Maarten Lankhorst Wed, 22 May 2013 19:24:38 +0200

Hey,

Op 22-05-13 18:18, Peter Zijlstra schreef:
> On Wed, May 22, 2013 at 01:18:14PM +0200, Maarten Lankhorst wrote:
>
> Lacking the actual msg atm, I'm going to paste in here...
Thanks for taking the time to review.
>> Subject: [PATCH v3 2/3] mutex: add support for wound/wait style locks, v3
>> From: Maarten Lankhorst <maarten.lankhorst at xxxxxxxxxxxxx>
>>
>> Changes since RFC patch v1:
>>  - Updated to use atomic_long instead of atomic, since the reservation_id 
>> was a long.
>>  - added mutex_reserve_lock_slow and mutex_reserve_lock_intr_slow
>>  - removed mutex_locked_set_reservation_id (or w/e it was called)
>> Changes since RFC patch v2:
>>  - remove use of __mutex_lock_retval_arg, add warnings when using wrong 
>> combination of
>>    mutex_(,reserve_)lock/unlock.
>> Changes since v1:
>>  - Add __always_inline to __mutex_lock_common, otherwise reservation paths 
>> can be
>>    triggered from normal locks, because __builtin_constant_p might evaluate 
>> to false
>>    for the constant 0 in that case. Tests for this have been added in the 
>> next patch.
>>  - Updated documentation slightly.
>> Changes since v2:
>>  - Renamed everything to ww_mutex. (mlankhorst)
>>  - Added ww_acquire_ctx and ww_class. (mlankhorst)
>>  - Added a lot of checks for wrong api usage. (mlankhorst)
>>  - Documentation updates. (danvet)
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at xxxxxxxxxxxxx>
>> Signed-off-by: Daniel Vetter <daniel.vetter at xxxxxxxx>
>> ---
>>  Documentation/ww-mutex-design.txt |  322 +++++++++++++++++++++++++++
>>  include/linux/mutex-debug.h       |    1 
>>  include/linux/mutex.h             |  257 +++++++++++++++++++++
>>  kernel/mutex.c                    |  445 
>> ++++++++++++++++++++++++++++++++++++-
>>  lib/debug_locks.c                 |    2 
>>  5 files changed, 1010 insertions(+), 17 deletions(-)
>>  create mode 100644 Documentation/ww-mutex-design.txt
>>
>> diff --git a/Documentation/ww-mutex-design.txt 
>> b/Documentation/ww-mutex-design.txt
>> new file mode 100644
>> index 0000000..154bae3
>> --- /dev/null
>> +++ b/Documentation/ww-mutex-design.txt
>> @@ -0,0 +1,322 @@
>> +Wait/Wound Deadlock-Proof Mutex Design
>> +======================================
>> +
>> +Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
>> +
>> +Motivation for WW-Mutexes
>> +-------------------------
>> +
>> +GPU's do operations that commonly involve many buffers.  Those buffers
>> +can be shared across contexts/processes, exist in different memory
>> +domains (for example VRAM vs system memory), and so on.  And with
>> +PRIME / dmabuf, they can even be shared across devices.  So there are
>> +a handful of situations where the driver needs to wait for buffers to
>> +become ready.  If you think about this in terms of waiting on a buffer
>> +mutex for it to become available, this presents a problem because
>> +there is no way to guarantee that buffers appear in a execbuf/batch in
>> +the same order in all contexts.  That is directly under control of
>> +userspace, and a result of the sequence of GL calls that an application
>> +makes.      Which results in the potential for deadlock.  The problem gets
>> +more complex when you consider that the kernel may need to migrate the
>> +buffer(s) into VRAM before the GPU operates on the buffer(s), which
>> +may in turn require evicting some other buffers (and you don't want to
>> +evict other buffers which are already queued up to the GPU), but for a
>> +simplified understanding of the problem you can ignore this.
>> +
>> +The algorithm that TTM came up with for dealing with this problem is quite
>> +simple.  For each group of buffers (execbuf) that need to be locked, the 
>> caller
>> +would be assigned a unique reservation id/ticket, from a global counter.  In
>> +case of deadlock while locking all the buffers associated with a execbuf, 
>> the
>> +one with the lowest reservation ticket (i.e. the oldest task) wins, and the 
>> one
>> +with the higher reservation id (i.e. the younger task) unlocks all of the
>> +buffers that it has already locked, and then tries again.
>> +
>> +In the RDBMS literature this deadlock handling approach is called 
>> wait/wound:
>> +The older tasks waits until it can acquire the contended lock. The younger 
>> tasks
>> +needs to back off and drop all the locks it is currently holding, i.e. the
>> +younger task is wounded.
>> +
>> +Concepts
>> +--------
>> +
>> +Compared to normal mutexes two additional concepts/objects show up in the 
>> lock
>> +interface for w/w mutexes:
>> +
>> +Acquire context: To ensure eventual forward progress it is important the a 
>> task
>> +trying to acquire locks doesn't grab a new reservation id, but keeps the 
>> one it
>> +acquired when starting the lock acquisition. This ticket is stored in the
>> +acquire context. Furthermore the acquire context keeps track of debugging 
>> state
>> +to catch w/w mutex interface abuse.
>> +
>> +W/w class: In contrast to normal mutexes the lock class needs to be 
>> explicit for
>> +w/w mutexes, since it is required to initialize the acquire context.
>> +
>> +Furthermore there are three different classe of w/w lock acquire functions:
>> +- Normal lock acquisition with a context, using ww_mutex_lock
>> +- Slowpath lock acquisition on the contending lock, used by the wounded task
>> +  after having dropped all already acquired locks. These functions have the
>> +  _slow postfix.
> See below, I don't see the need for this interface.
>
>> +- Functions to only acquire a single w/w mutex, which results in the exact 
>> same
>> +  semantics as a normal mutex. These functions have the _single postfix.
> This is missing rationale.
trylock_single is useful when iterating over a list, and you want to evict a 
bo, but only the first one that can be acquired.
lock_single is useful when only a single bo needs to be acquired, for example 
to lock a buffer during mmap.


>> +
>> +Of course, all the usual variants for handling wake-ups due to signals are 
>> also
>> +provided.
>> +
>> +Usage
>> +-----
>> +
>> +Three different ways to acquire locks within the same w/w class. Common
>> +definitions for methods 1&2.
>> +
>> +static DEFINE_WW_CLASS(ww_class);
>> +
>> +struct obj {
>> +    sct ww_mutex lock;
>> +    /* obj data */
>> +};
>> +
>> +struct obj_entry {
>> +    struct list_head *list;
>> +    struct obj *obj;
>> +};
>> +
>> +Method 1, using a list in execbuf->buffers that's not allowed to be 
>> reordered.
>> +This is useful if a list of required objects is already tracked somewhere.
>> +Furthermore the lock helper can use propagate the -EALREADY return code 
>> back to
>> +the caller as a signal that an object is twice on the list. This is useful 
>> if
>> +the list is constructed from userspace input and the ABI requires userspace 
>> to
>> +no have duplicate entries (e.g. for a gpu commandbuffer submission ioctl).
>> +
>> +int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
>> +{
>> +    struct obj *res_obj = NULL;
>> +    struct obj_entry *contended_entry = NULL;
>> +    struct obj_entry *entry;
>> +
>> +    ww_acquire_init(ctx, &ww_class);
>> +
>> +retry:
>> +    list_for_each_entry (list, entry) {
>> +            if (entry == res_obj) {
>> +                    res_obj = NULL;
>> +                    continue;
>> +            }
>> +            ret = ww_mutex_lock(&entry->obj->lock, ctx);
>> +            if (ret < 0) {
>> +                    contended_obj = entry;
>> +                    goto err;
>> +            }
>> +    }
>> +
>> +    ww_acquire_done(ctx);
>> +    return 0;
>> +
>> +err:
>> +    list_for_each_entry_continue_reverse (list, contended_entry, entry)
>> +            ww_mutex_unlock(&entry->obj->lock);
>> +
>> +    if (res_obj)
>> +            ww_mutex_unlock(&res_obj->lock);
>> +
>> +    if (ret == -EDEADLK) {
>> +            /* we lost out in a seqno race, lock and retry.. */
>> +            ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
> I missing the need for ww_mutex_lock_slow(). AFAICT we should be able to tell
> its the first lock in the ctx and thus we cannot possibly deadlock.
Theoretically true, but that would require always setting ctx->acquired 
correctly.
Plus that would weaken the checks. Without ww_mutex_lock_slow you can not
say for sure all mutexes have been unlocked, and tell that what you say is 
really true.

>> +            res_obj = contended_entry->obj;
>> +            goto retry;
>> +    }
>> +    ww_acquire_fini(ctx);
>> +
>> +    return ret;
>> +}
>> +
> ... you certainly went all out on documentation.
>
>> diff --git a/include/linux/mutex-debug.h b/include/linux/mutex-debug.h
>> index 731d77d..4ac8b19 100644
>> --- a/include/linux/mutex-debug.h
>> +++ b/include/linux/mutex-debug.h
>> @@ -3,6 +3,7 @@
>>  
>>  #include <linux/linkage.h>
>>  #include <linux/lockdep.h>
>> +#include <linux/debug_locks.h>
>>  
>>  /*
>>   * Mutexes - debugging helpers:
>> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
>> index 9121595..004f863 100644
>> --- a/include/linux/mutex.h
>> +++ b/include/linux/mutex.h
>> @@ -74,6 +74,35 @@ struct mutex_waiter {
>>  #endif
>>  };
>>  
>> +struct ww_class {
>> +    atomic_long_t stamp;
>> +    struct lock_class_key acquire_key;
>> +    struct lock_class_key mutex_key;
>> +    const char *acquire_name;
>> +    const char *mutex_name;
>> +};
>> +
>> +struct ww_acquire_ctx {
>> +    struct task_struct *task;
>> +    unsigned long stamp;
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +    unsigned acquired, done_acquire;
>> +    struct ww_class *ww_class;
>> +    struct ww_mutex *contending_lock;
>> +#endif
>> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>> +    struct lockdep_map dep_map;
>> +#endif
>> +};
>> +
>> +struct ww_mutex {
>> +    struct mutex base;
>> +    struct ww_acquire_ctx *ctx;
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +    struct ww_class *ww_class;
>> +#endif
>> +};
>> +
>
>> @@ -167,6 +236,192 @@ extern int __must_check mutex_lock_killable(struct 
>> mutex *lock);
>>   */
>>  extern int mutex_trylock(struct mutex *lock);
>>  extern void mutex_unlock(struct mutex *lock);
>> +
>> +/**
>> + * ww_acquire_init - initialize a w/w acquire context
>> + * @ctx: w/w acquire context to initialize
>> + * @ww_class: w/w class of the context
>> + *
>> + * Initializes an context to acquire multiple mutexes of the given w/w 
>> class.
>> + *
>> + * Context-based w/w mutex acquiring can be done in any order whatsoever 
>> within
>> + * a given lock class. Deadlocks will be detected and handled with the
>> + * wait/wound logic.
>> + *
>> + * Mixing of context-based w/w mutex acquiring and single w/w mutex locking 
>> can
>> + * result in undetected deadlocks and is so forbidden. Mixing different 
>> contexts
>> + * for the same w/w class when acquiring mutexes can also result in 
>> undetected
>> + * deadlocks, and is hence also forbidden.
>> + *
>> + * Nesting of acquire contexts for _different_ w/w classes is possible, 
>> subject
>> + * to the usual locking rules between different lock classes.
>> + *
>> + * An acquire context must be release by the same task before the memory is
>> + * freed with ww_acquire_fini. It is recommended to allocate the context 
>> itself
>> + * on the stack.
>> + */
>> +static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
>> +                               struct ww_class *ww_class)
>> +{
>> +    ctx->task = current;
>> +    do {
>> +            ctx->stamp = atomic_long_inc_return(&ww_class->stamp);
>> +    } while (unlikely(!ctx->stamp));
> I suppose we'll figure something out when this becomes a bottleneck. Ideally
> we'd do something like:
>
>  ctx->stamp = local_clock();
>
> but for now we cannot guarantee that's not jiffies, and I suppose that's a tad
> too coarse to work for this.
This might mess up when 2 cores happen to return exactly the same time, how do 
you choose a winner in that case?
EDIT: Using pointer address like you suggested below is fine with me. ctx 
pointer would be static enough.

> Also, why is 0 special?
Oops, 0 is no longer special.

I used to set the samp directly on the lock, so 0 used to mean no ctx set.
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +    ctx->ww_class = ww_class;
>> +    ctx->acquired = ctx->done_acquire = 0;
>> +    ctx->contending_lock = NULL;
>> +#endif
>> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>> +    debug_check_no_locks_freed((void *)ctx, sizeof(*ctx));
>> +    lockdep_init_map(&ctx->dep_map, ww_class->acquire_name,
>> +                     &ww_class->acquire_key, 0);
>> +    mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_);
>> +#endif
>> +}
>> +/**
>> + * ww_mutex_trylock_single - tries to acquire the w/w mutex without acquire 
>> context
>> + * @lock: mutex to lock
>> + *
>> + * Trylocks a mutex without acquire context, so no deadlock detection is
>> + * possible. Returns 0 if the mutex has been acquired.
>> + *
>> + * Unlocking the mutex must happen with a call to ww_mutex_unlock_single.
>> + */
>> +static inline int __must_check ww_mutex_trylock_single(struct ww_mutex 
>> *lock)
>> +{
>> +    return mutex_trylock(&lock->base);
>> +}
> trylocks can never deadlock they don't block per definition, I don't see the
> point of the _single() thing here.
I called it single because they weren't annotated into any ctx. I can drop the 
_single suffix though,
but you'd still need to unlock with unlock_single, or we need to remove that 
distinction altogether,
lose a few lockdep checks and only have a one unlock function.

>> +/**
>> + * ww_mutex_lock_single - acquire the w/w mutex without acquire context
>> + * @lock: mutex to lock
>> + *
>> + * Locks a mutex without acquire context, so no deadlock detection is
>> + * possible.
>> + *
>> + * Unlocking the mutex must happen with a call to ww_mutex_unlock_single.
>> + */
>> +static inline void ww_mutex_lock_single(struct ww_mutex *lock)
>> +{
>> +    mutex_lock(&lock->base);
>> +}
> as per the above, I'm missing the rationale for having this.
>
>> diff --git a/kernel/mutex.c b/kernel/mutex.c
>> index 84a5f07..66807c7 100644
>> --- a/kernel/mutex.c
>> +++ b/kernel/mutex.c
>> @@ -127,16 +127,156 @@ void __sched mutex_unlock(struct mutex *lock)
>>  
>>  EXPORT_SYMBOL(mutex_unlock);
>>  
>> +/**
>> + * ww_mutex_unlock - release the w/w mutex
>> + * @lock: the mutex to be released
>> + *
>> + * Unlock a mutex that has been locked by this task previously
>> + * with ww_mutex_lock* using an acquire context. It is forbidden to release 
>> the
>> + * locks after releasing the acquire context.
>> + *
>> + * This function must not be used in interrupt context. Unlocking
>> + * of a unlocked mutex is not allowed.
>> + *
>> + * Note that locks acquired with one of the ww_mutex_lock*single variant 
>> must be
>> + * unlocked with ww_mutex_unlock_single.
>> + */
>> +void __sched ww_mutex_unlock(struct ww_mutex *lock)
>> +{
>> +    /*
>> +     * The unlocking fastpath is the 0->1 transition from 'locked'
>> +     * into 'unlocked' state:
>> +     */
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +    DEBUG_LOCKS_WARN_ON(!lock->ctx);
>> +    if (lock->ctx) {
>> +            DEBUG_LOCKS_WARN_ON(!lock->ctx->acquired);
>> +            if (lock->ctx->acquired > 0)
>> +                    lock->ctx->acquired--;
>> +    }
>> +#endif
>> +    lock->ctx = NULL;
> barriers should always have a comment explaining the exact ordering and the
> pairing barrier's location.
>
>> +    smp_mb__before_atomic_inc();
>> +
>> +#ifndef CONFIG_DEBUG_MUTEXES
>> +    /*
>> +     * When debugging is enabled we must not clear the owner before time,
>> +     * the slow path will always be taken, and that clears the owner field
>> +     * after verifying that it was indeed current.
>> +     */
>> +    mutex_clear_owner(&lock->base);
>> +#endif
>> +    __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
>> +}
>> +EXPORT_SYMBOL(ww_mutex_unlock);
>> +
>> +static inline int __sched
>> +__mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
>> +{
>> +    struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>> +    struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
>> +
>> +    if (!hold_ctx)
>> +            return 0;
>> +
>> +    if (unlikely(ctx->stamp == hold_ctx->stamp))
>> +            return -EALREADY;
> Why compare stamps? I expected: ctx == hold_ctx here.
Because the check just below it compares stamps too, having the same check
would make it clear that when ctx->stamp - hold_ctx->stamp == 0 is not expected.

>> +
>> +    if (unlikely(ctx->stamp - hold_ctx->stamp <= LONG_MAX)) {
> Why not simply write: ctx->stamp > hold_ctx->stamp ?
To handle the wraparound case on 32-bits?
> If we need to deal with equal stamps from different contexts we could 
> tie-break
> based on ctx address or so, but seeing its a global counter from the class,
> that shouldn't happen for now.
Sounds good enough to me.

>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +            DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
>> +            ctx->contending_lock = ww;
>> +#endif
>> +            return -EDEADLK;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
>> +                                               struct ww_acquire_ctx 
>> *ww_ctx,
>> +                                               bool ww_slow)
>> +{
>> +#ifdef CONFIG_DEBUG_MUTEXES
>> +    /*
>> +     * If this WARN_ON triggers, you used mutex_lock to acquire,
>> +     * but released with ww_mutex_unlock in this call.
>> +     */
>> +    DEBUG_LOCKS_WARN_ON(ww->ctx);
>> +
>> +    /*
>> +     * Not quite done after ww_acquire_done() ?
>> +     */
>> +    DEBUG_LOCKS_WARN_ON(ww_ctx->done_acquire);
>> +
>> +    if (ww_slow) {
> s/ww_slow/!ww_ctx->acquired/
>
>> +            DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww);
>> +            ww_ctx->contending_lock = NULL;
>> +    } else
>> +            DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
>> +
>> +
>> +    /*
>> +     * Naughty, using a different class can lead to undefined behavior!
>> +     */
>> +    DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>> +
>> +    if (ww_slow)
>> +            DEBUG_LOCKS_WARN_ON(ww_ctx->acquired > 0);
>> +
>> +    ww_ctx->acquired++;
>> +#endif
>> +}
>> +
>> +/*
>> + * after acquiring lock with fastpath or when we lost out in contested
>> + * slowpath, set ctx and wake up any waiters so they can recheck.
>> + *
>> + * This function is never called when CONFIG_DEBUG_LOCK_ALLOC is set,
>> + * as the fastpath and opportunistic spinning are disabled in that case.
>> + */
>> +static __always_inline void
>> +ww_mutex_set_context_fastpath(struct ww_mutex *lock,
>> +                           struct ww_acquire_ctx *ctx)
>> +{
>> +    unsigned long flags;
>> +    struct mutex_waiter *cur;
>> +
>> +    ww_mutex_lock_acquired(lock, ctx, false);
>> +
>> +    lock->ctx = ctx;
>  missing comment
Yeah, this was patched up as per danvet's command, moved the smp_mb__after 
upwards, and added a full smp_mb after setting lock->ctx.
>> +    smp_mb__after_atomic_dec();
>> +
>> +    /*
>> +     * Check if lock is contended, if not there is nobody to wake up
>> +     */
>> +    if (likely(atomic_read(&lock->base.count) == 0))
>> +            return;
>> +
>> +    /*
>> +     * Uh oh, we raced in fastpath, wake up everyone in this case,
>> +     * so they can see the new ctx
>> +     */
>> +    spin_lock_mutex(&lock->base.wait_lock, flags);
>> +    list_for_each_entry(cur, &lock->base.wait_list, list) {
>> +            debug_mutex_wake_waiter(&lock->base, cur);
>> +            wake_up_process(cur->task);
>> +    }
>> +    spin_unlock_mutex(&lock->base.wait_lock, flags);
>> +}
>> +
>>  /*
>>   * Lock a mutex (possibly interruptible), slowpath:
>>   */
>> -static inline int __sched
>> +static __always_inline int __sched
>>  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>> -                struct lockdep_map *nest_lock, unsigned long ip)
>> +                struct lockdep_map *nest_lock, unsigned long ip,
>> +                struct ww_acquire_ctx *ww_ctx, bool ww_slow)
>>  {
>>      struct task_struct *task = current;
>>      struct mutex_waiter waiter;
>>      unsigned long flags;
>> +    int ret;
>>  
>>      preempt_disable();
>>      mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
>> @@ -163,6 +303,14 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>      for (;;) {
>>              struct task_struct *owner;
>>  
>> +            if (!__builtin_constant_p(ww_ctx == NULL) && !ww_slow) {
> Since we _know_ ww_ctx isn't NULL, we can trivially do: 
> s/ww_slow/!ww_ctx->acquired/
>
>> +                    struct ww_mutex *ww;
>> +
>> +                    ww = container_of(lock, struct ww_mutex, base);
>> +                    if (ACCESS_ONCE(ww->ctx))
> What's the point of this ACCESS_ONCE()?
Break out of the spin_on_owner loop. Without taking spin_lock_mutex there is no 
guarantee that the
contents of ww->ctx are valid or sane in any way, so there's no way to check if 
we ought to back off or not.


>> +                            break;
>> +            }
>> +
>>              /*
>>               * If there's an owner, wait for it to either
>>               * release the lock or go to sleep.
>> @@ -173,6 +321,13 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>  
>>              if (atomic_cmpxchg(&lock->count, 1, 0) == 1) {
>>                      lock_acquired(&lock->dep_map, ip);
> Should this not also have a __builtin_constant_p(ww_ctx == NULL) ?
ww_slow should not be set to non-zero when ww_ctx == NULL.

>> +                    if (ww_slow) {
>> +                            struct ww_mutex *ww;
>> +                            ww = container_of(lock, struct ww_mutex, base);
>> +
>> +                            ww_mutex_set_context_fastpath(ww, ww_ctx);
>> +                    }
>> +
>>                      mutex_set_owner(lock);
>>                      preempt_enable();
>>                      return 0;
>> @@ -228,15 +383,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>               * TASK_UNINTERRUPTIBLE case.)
>>               */
>>              if (unlikely(signal_pending_state(state, task))) {
>> -                    mutex_remove_waiter(lock, &waiter,
>> -                                        task_thread_info(task));
>> -                    mutex_release(&lock->dep_map, 1, ip);
>> -                    spin_unlock_mutex(&lock->wait_lock, flags);
>> +                    ret = -EINTR;
>> +                    goto err;
>> +            }
>>  
>> -                    debug_mutex_free_waiter(&waiter);
>> -                    preempt_enable();
>> -                    return -EINTR;
>> +            if (!__builtin_constant_p(ww_ctx == NULL) && !ww_slow) {
>> +                    ret = __mutex_lock_check_stamp(lock, ww_ctx);
>> +                    if (ret)
>> +                            goto err;
>>              }
>> +
>>              __set_task_state(task, state);
>>  
>>              /* didn't get the lock, go to sleep: */
>> @@ -251,6 +407,30 @@ done:
>>      mutex_remove_waiter(lock, &waiter, current_thread_info());
>>      mutex_set_owner(lock);
>>  
>> +    if (!__builtin_constant_p(ww_ctx == NULL)) {
>> +            struct ww_mutex *ww = container_of(lock,
>> +                                                  struct ww_mutex,
>> +                                                  base);
>> +            struct mutex_waiter *cur;
>> +
>> +            /*
>> +             * This branch gets optimized out for the common case,
>> +             * and is only important for ww_mutex_lock.
>> +             */
>> +
>> +            ww_mutex_lock_acquired(ww, ww_ctx, ww_slow);
>> +            ww->ctx = ww_ctx;
>> +
>> +            /*
>> +             * Give any possible sleeping processes the chance to wake up,
>> +             * so they can recheck if they have to back off.
>> +             */
>> +            list_for_each_entry(cur, &lock->wait_list, list) {
>> +                    debug_mutex_wake_waiter(lock, cur);
>> +                    wake_up_process(cur->task);
>> +            }
>> +    }
>> +
>>      /* set it to 0 if there are no waiters left: */
>>      if (likely(list_empty(&lock->wait_list)))
>>              atomic_set(&lock->count, 0);
>> @@ -261,6 +441,14 @@ done:
>>      preempt_enable();
>>  
>>      return 0;
>> +
>> +err:
>> +    mutex_remove_waiter(lock, &waiter, task_thread_info(task));
>> +    spin_unlock_mutex(&lock->wait_lock, flags);
>> +    debug_mutex_free_waiter(&waiter);
>> +    mutex_release(&lock->dep_map, 1, ip);
>> +    preempt_enable();
>> +    return ret;
>>  }
>>  
>>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> @@ -268,7 +456,8 @@ void __sched
>>  mutex_lock_nested(struct mutex *lock, unsigned int subclass)
>>  {
>>      might_sleep();
>> -    __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, 
>> _RET_IP_);
>> +    __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
>> +                        subclass, NULL, _RET_IP_, 0, 0);
>>  }
> The pendant in me has to tell you 4x that NULL != 0 :-)
>
>> +EXPORT_SYMBOL_GPL(ww_mutex_lock);
>> +EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible);
>> +EXPORT_SYMBOL_GPL(ww_mutex_lock_slow);
>> +EXPORT_SYMBOL_GPL(ww_mutex_lock_slow_interruptible);
> Now having to do the _slow stuff saves lines and interface complexity!
It will also reduce useful debugging information returned a little. Danvet 
answered it better than me.

>> @@ -401,20 +738,39 @@ __mutex_lock_slowpath(atomic_t *lock_count)
>>  {
>>      struct mutex *lock = container_of(lock_count, struct mutex, count);
>>  
>> -    __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
>> +    __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0,
>> +                        NULL, _RET_IP_, 0, 0);
>>  }
>>  
>>  static noinline int __sched
>>  __mutex_lock_killable_slowpath(struct mutex *lock)
>>  {
>> -    return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
>> +    return __mutex_lock_common(lock, TASK_KILLABLE, 0,
>> +                               NULL, _RET_IP_, 0, 0);
>>  }
>>  
>>  static noinline int __sched
>>  __mutex_lock_interruptible_slowpath(struct mutex *lock)
>>  {
>> -    return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
>> +    return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0,
>> +                               NULL, _RET_IP_, 0, 0);
>>  }
> A few more cases where NULL != 0 :-)

But yeah, so open questions..

1. Do you still want to get rid of the _single variants, even though doing so 
would slightly reduce debugging?
2. Do you really want to drop the *_slow variants?
Doing so might reduce debugging slightly. I like method #2 in 
ww-mutex-design.txt, it makes it very clear why you
would handle the *_slow case differently anyway.
3. is a smp_mb needed to serialize lock->ctx with the atomic_read?

(mutex locked in fastpath, which is typically an atomic_dec operation)
smp_mb__after_atomic_dec();
lock->ctx = ..;
smp_mb();
if (atomic_read(lock->count) == 0) return;

feels a bit like overkill to me.

~Maarten

[PATCH v3 2/3] mutex: add support for wound/wait style locks, v3

Reply via email to