On Wed, Aug 31, 2016 at 06:59:07AM +0200, Manfred Spraul wrote: > The barrier must ensure that taking the spinlock (as observed by another cpu > with spin_unlock_wait()) and a following read are ordered. > > start condition: sma->complex_mode = false; > > CPU 1: > spin_lock(&sem->lock); /* sem_nsems instances */ > smp_mb__after_spin_lock(); > if (!smp_load_acquire(&sma->complex_mode)) { > /* fast path successful! */ > return sops->sem_num; > } > /* slow path, not relevant */ > > CPU 2: (holding sma->sem_perm.lock) > > smp_store_mb(sma->complex_mode, true); > > for (i = 0; i < sma->sem_nsems; i++) { > spin_unlock_wait(&sma->sem_base[i].lock); > } > > It must not happen that both CPUs proceed: > Either CPU1 proceeds, then CPU2 must spin in spin_unlock_wait() > or CPU2 proceeds, then CPU1 must enter the slow path. > > What about this? > /* > * spin_lock() provides ACQUIRE semantics regarding reading the lock. > * There are no guarantees that the store of the lock is visible before > * any read or write operation within the protected area is performed. > * If the store of the lock must happen first, this function is required. > */ > #define spin_lock_store_acquire()
So I think the fundamental problem is with our atomic_*_acquire() primitives, where we've specified that the ACQUIRE only pertains to the LOAD of the RmW. The spinlock implementations suffer this problem mostly because of that (not 100% accurate but close enough). One solution would be to simply use smp_mb__after_atomic(). The 'problem' with that is __atomic_op_acquire() defaults to using that, so the archs that use __atomic_op_acquire() will get a double smp_mb() (arm64 and powerpc do not use __atomic_op_acquire()). I'm not sure we want to introduce a new primitive for this specific to spinlocks. Will, any opinions?