On Wed, Jul 11, 2018 at 02:34:21PM +0200, Andrea Parri wrote: > Simplicity is the eye of the beholder. From my POV (LKMM maintainer), the > simplest solution would be to get rid of rfi-rel-acq and unlock-rf-lock-po > (or its analogous in v3) all together:
<snip> > Among other things, this would immediately: > > 1) Enable RISC-V to use their .aq/.rl annotations _without_ having to > "worry" about tso or release/acquire fences; IOW, this will permit > a partial revert of: > > 0123f4d76ca6 ("riscv/spinlock: Strengthen implementations with fences") > 5ce6c1f3535f ("riscv/atomic: Strengthen implementations with fences") But I feel this goes in the wrong direction. This weakens the effective memory model, where I feel we should strengthen it. Currently PowerPC is the weakest here, and the above RISC-V changes (reverts) would make RISC-V weaker still. Any any effective weakening makes me very uncomfortable -- who knows what will come apart this time. This memory ordering stuff causes horrible subtle bugs at best. > 2) Resolve the above mentioned controversy (the inconsistency between > - locking operations and atomic RMWs on one side, and their actual > implementation in generic code on the other), thus enabling the use > of LKMM _and_ its tools for the analysis/reviewing of the latter. This is a good point; so lets see if there is something we can do to strengthen the model so it all works again. And I think if we raise atomic*_acquire() to require TSO (but ideally raise it to RCsc) we're there. The TSO archs have RCpc load-acquire and store-release, but fully ordered atomics. Most of the other archs have smp_mb() everything, with the exception of PPC, ARM64 and now RISC-V. PPC has the RCpc TSO fence LWSYNC, ARM64 has the RCsc load-acquire/store-release. And RISC-V has a gazillion of options IIRC. So ideally atomic*_acquire() + smp_store_release() will be RCsc, and is with the notable exception of PPC, and ideally RISC-V would be RCsc here. But at the very least it should not be weaker than PPC. By increasing atomic*_acquire() to TSO we also immediately get the proposed: P0() { WRITE_ONCE(X, 1); spin_unlock(&s); spin_lock(&s); WRITE_ONCE(Y, 1); } P1() { r1 = READ_ONCE(Y); smp_rmb(); r2 = READ_ONCE(X); } behaviour under discussion; because the spin_lock() will imply the TSO ordering. And note that this retains regular RCpc ACQUIRE for smp_load_acquire() and associated primitives -- as they have had since their introduction not too long ago.