On Sun, Nov 3, 2013 at 2:42 PM, Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote: > > smp_storebuffer_mb() -- A barrier that enforces those orderings > that do not invalidate the hardware store-buffer optimization.
Ugh. Maybe. Can you guarantee that those are the correct semantics? And why talk about the hardware semantics, when you really want specific semantics for the *software*. > smp_not_w_r_mb() -- A barrier that orders everything except prior > writes against subsequent reads. Ok, that sounds more along the lines of "these are the semantics we want", but I have to say, it also doesn't make me go "ahh, ok". > smp_acqrel_mb() -- A barrier that combines C/C++ acquire and release > semantics. (C/C++ "acquire" orders a specific load against > subsequent loads and stores, while C/C++ "release" orders > a specific store against prior loads and stores.) I don't think this is true. acquire+release is much stronger than what you're looking for - it doesn't allow subsequent reads to move past the write (because that would violate the acquire part). On x86, for example, you'd need to have a locked cycle for smp_acqrel_mb(). So again, what are the guarantees you actually want? Describe those. And then make a name. I _think_ the guarantees you want is: - SMP write barrier - *local* read barrier for reads preceding the write. but the problem is that the "preceding reads" part is really specifically about the write that you had. The barrier should really be attached to the *particular* write operation, it cannot be a standalone barrier. So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the problem is that a "smp_rmb()" doesn't really "attach" to the preceding write. This is analogous to a "acquire" operation: you cannot make an "acquire" barrier, because it's not a barrier *between* two ops, it's associated with one particular op. So what I *think* you actually really really want is a "store with release consistency, followed by a write barrier". In TSO, afaik all stores have release consistency, and all writes are ordered, which is why this is a no-op in TSO. And x86 also has that "all stores have release consistency, and all writes are ordered" model, even if TSO doesn't really describe the x86 model. But on ARM64, for example, I think you'd really want the store itself to be done with "stlr" (store with release), and then follow up with a "dsb st" after that. And notice how that requires you to mark the store itself. There is no actual barrier *after* the store that does the optimized model. Of course, it's entirely possible that it's not worth worrying about this on ARM64, and that just doing it as a "normal store followed by a full memory barrier" is good enough. But at least in *theory* a microarchitecture might make it much cheaper to do a "store with release consistency" followed by "write barrier". Anyway, having talked exhaustively about exactly what semantics you are after, I *think* the best model would be to just have a #define smp_store_with_release_semantics(x, y) ... and use that *and* a "smp_wmb()" for this (possibly a special "smp_wmb_after_release()" if that allows people to avoid double barriers). On x86 (and TSO systems), the smp_store_with_release_semantics() would be just a regular store, and the smp_wmb() is obviously a no-op. Other platforms would end up doing other things. Hmm? Linus _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev