On Sat Apr 13, 2024 at 7:48 PM AEST, Michael Ellerman wrote: > Nicholas Piggin <npig...@gmail.com> writes: > > "Fully ordered" atomics (RMW that return a value) are said to have a > > full barrier before and after the atomic operation. This is implemented > > as: > > > > hwsync > > larx > > ... > > stcx. > > bne- > > hwsync > > > > This is slow on POWER processors because hwsync and stcx. require a > > round-trip to the nest (~= L2 cache). The hwsyncs can be avoided with > > the sequence: > > > > lwsync > > larx > > ... > > stcx. > > bne- > > isync > > > > lwsync prevents all reorderings except store/load reordering, so the > > larx could be execued ahead of a prior store becoming visible. However > > the stcx. is a store, so it is ordered by the lwsync against all prior > > access and if the value in memory had been modified since the larx, it > > will fail. So the point at which the larx executes is not a concern > > because the stcx. always verifies the memory was unchanged. > > > > The isync prevents subsequent instructions being executed before the > > stcx. executes, and stcx. is necessarily visible to the system after > > it executes, so there is no opportunity for it (or prior stores, thanks > > to lwsync) to become visible after a subsequent load or store. > > AFAICS commit b97021f85517 ("powerpc: Fix atomic_xxx_return barrier > semantics") disagrees. > > That was 2011, so maybe it's wrong or outdated?
Hmm, thanks for the reference. I didn't know about that. isync or ordering execution / completion of a load after a previous plain store doesn't guarantee ordering, because stores drain from queues and become visible some time after they complete. Okay, but I was thinking a successful stcx. should have to be visible. I guess that's not true if it broke on P7. Maybe it's possible in the architecture to have a successful stcx. not having this property though, I find it pretty hard to read this part of the architecture. Clearly it has to be visible to other processors performing larx/stcx., otherwise the canonical stcx. ; bne- ; isync sequence can't provide mutual exclusion. I wonder if the P7 breakage was caused by to the "wormhole coherency" thing that was changed ("fixed") in POWER9. I'll have to look into it a bit more. The cache coherency guy I was talking to retired before answering :/ I'll have to find another victim. I should try to break a P8 too. > > Either way it would be good to have some litmus tests to back this up. > > cheers > > ps. isn't there a rule against sending barrier patches after midnight? ;) There should be if not. Thanks, Nick