On Sun, Nov 03, 2013 at 11:34:00PM +0000, Linus Torvalds wrote: > So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the > problem is that a "smp_rmb()" doesn't really "attach" to the preceding > write.
Agreed. > This is analogous to a "acquire" operation: you cannot make an > "acquire" barrier, because it's not a barrier *between* two ops, it's > associated with one particular op. > > So what I *think* you actually really really want is a "store with > release consistency, followed by a write barrier". How does that order reads against reads? (Paul mentioned this as a requirement). I not clear about the use case for this, so perhaps there is a dependency that I'm not aware of. > In TSO, afaik all stores have release consistency, and all writes are > ordered, which is why this is a no-op in TSO. And x86 also has that > "all stores have release consistency, and all writes are ordered" > model, even if TSO doesn't really describe the x86 model. > > But on ARM64, for example, I think you'd really want the store itself > to be done with "stlr" (store with release), and then follow up with a > "dsb st" after that. So a dsb is pretty heavyweight here (it prevents execution of *any* further instructions until all preceeding stores have completed, as well as ensuring completion of any ongoing cache flushes). In conjunction with the store-release, that's going to hold everything up until the store-release (and therefore any preceeding memory accesses) have completed. Granted, I think that gives Paul his read/read ordering, but it's a lot heavier than what's required. > And notice how that requires you to mark the store itself. There is no > actual barrier *after* the store that does the optimized model. > > Of course, it's entirely possible that it's not worth worrying about > this on ARM64, and that just doing it as a "normal store followed by a > full memory barrier" is good enough. But at least in *theory* a > microarchitecture might make it much cheaper to do a "store with > release consistency" followed by "write barrier". I agree with the sentiment but, given that this stuff is so heavily microarchitecture-dependent (and not simple to probe), a simple dmb ish might be the best option after all. That's especially true if the microarchitecture decided to ignore the barrier options and treat everything as `all accesses, full system' in order to keep the hardware design simple. Will _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev