On Wed, Aug 31, 2016 at 08:32:18PM +0200, Manfred Spraul wrote: > On 08/31/2016 06:40 PM, Will Deacon wrote:
> >The litmus test then looks a bit like: > > > >CPUm: > > > >LOCK(x) > >smp_mb(); > >RyAcq=0 > > > > > >CPUn: > > > >Wy=1 > >smp_mb(); > >UNLOCK_WAIT(x) > Correct. > > > >which I think can be simplified to: > > > > > >LOCK(x) > I thought that here a barrier is required, because Ry=0 could be before > store of the lock. > >Ry=0 > RyAcq instead of Ry would required due to the unlock at the end of the > critical section > CpuN: <...> > WyRelease=0 > for the litmus test irrelevant. > >Wy=1 > >smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64 > >LOCK(x) // spin_unlock_wait behaves like lock; unlock > >UNLOCK(x) > > >[I've removed a bunch of barriers here, that I don't think are necessary > > for the guarantees you're after] > > > >and the question is "Can both CPUs proceed?". > > > >Looking at the above, then I don't think that they can. Whilst CPUm can > >indeed speculate the Ry=0 before successfully taking the lock, if CPUn > >observes CPUm's read, then it must also observe the lock being held wrt > >the spin_lock API. That is because a successful LOCK operation by CPUn > >would force CPUm to replay its LL/SC loop and therefore discard its > >speculation of y. > > > >What am I missing? The code snippet seems to have too many barriers to me! > spin_unlock_wait() is not necessarily lock()+unlock(). > It can be a simple Rx, or now RxAcq. Can be, normally, yes. But on power and arm64, the only architectures on which the ACQUIRE is 'funny' they do the 'pointless' ll/sc cycle in spin_unlock_wait() to 'fix' things. So for both power and arm64, you can in fact model spin_unlock_wait() as LOCK+UNLOCK. All the other archs have (so far) 'sensible' ACQUIRE semantics and all this is moot. [ MIPS _could_ possibly do the 'interesting' ACQUIRE too, but so far hasn't introduced their fancy barriers. ] The other interesting case is qspinlock, which does the unordered store in software, but that is after a necessary atomic (ACQUIRE) operation to enqueue, which we exploit in the queued_spin_unlock_wait() to order against. So that too is good, assuming the ACQUIRE is good. Once Power/ARM64 use qspinlock, they'll need to be careful again.