On Fri, May 20, 2016 at 10:00:45AM -0700, Linus Torvalds wrote: > On Fri, May 20, 2016 at 9:04 AM, Peter Zijlstra <pet...@infradead.org> wrote: > > + * queued_spin_lock_slowpath() can ACQUIRE the lock before > > + * issuing the unordered store that sets _Q_LOCKED_VAL. > > Ugh. This was my least favorite part of the queued locks, and I really > liked the completely unambiguous semantics of the x86 atomics-based > versions we used to have. > > But I guess we're stuck with it.
Yeah, I wasn't too happy either when I realized it today. We _could_ make these atomic ops, but then we're just making things slower for this one weird case :/ > That said, it strikes me that almost all of the users of > "spin_is_locked()" are using it for verification purposes (not locking > correctness), Right; although I feel those people should be using lockdep_assert_held() for this instead. That not only compiles out when !LOCKDEP but also asserts the current task/cpu is the lock owner, not someone else. > and that the people who are playing games with locking > correctness are few and already have to play *other* games anyway. > > See for example "ipc_smp_acquire__after_spin_is_unlocked()", which has > a big comment atop of it that now becomes nonsensical with this patch. Not quite; we still need that I think. We now have: spin_lock(A); smp_mb(); while (!spin_is_locked(B)) cpu_relax(); smp_rmb(); And that control dependency together with the rmb form a load-acquire on the unlocked B, which matches the release of the spin_unlock(B) and ensures we observe the whole previous critical section we waited for. The new smp_mb() doesn't help with that. > Now, I'd take Peter's patch as-is, because I don't think any of this > matters from a *performance* standpoint, and Peter's patch is much > smaller and simpler. I would suggest you do this and also mark it for stable v4.2 and later. > But the reason I think it might be a good thing to introduce those > spin_lock_synchronize() and splin_lock_acquire_after_unlock() concepts > would be to make it very very clear what those subtle implementations > in mutexes and the multi-level locks in the ipc layer are doing and > what they rely on. We can always do the fancy stuff on top, but that isn't going to need backporting to all stable trees, this is. I'll think a little more on the explicit document vs simple thing.