On Mon, Nov 04, 2013 at 11:05:53AM +0000, Will Deacon wrote: > On Sun, Nov 03, 2013 at 11:34:00PM +0000, Linus Torvalds wrote: > > So it would *kind* of act like a "smp_wmb() + smp_rmb()", but the > > problem is that a "smp_rmb()" doesn't really "attach" to the preceding > > write. > > Agreed. > > > This is analogous to a "acquire" operation: you cannot make an > > "acquire" barrier, because it's not a barrier *between* two ops, it's > > associated with one particular op. > > > > So what I *think* you actually really really want is a "store with > > release consistency, followed by a write barrier". > > How does that order reads against reads? (Paul mentioned this as a > requirement). I not clear about the use case for this, so perhaps there is a > dependency that I'm not aware of.
An smp_store_with_release_semantics() orders against prior reads -and- writes. It maps to barrier() for x86, stlr for ARM, and lwsync for PowerPC, as called out in my prototype definitions. > > In TSO, afaik all stores have release consistency, and all writes are > > ordered, which is why this is a no-op in TSO. And x86 also has that > > "all stores have release consistency, and all writes are ordered" > > model, even if TSO doesn't really describe the x86 model. > > > > But on ARM64, for example, I think you'd really want the store itself > > to be done with "stlr" (store with release), and then follow up with a > > "dsb st" after that. > > So a dsb is pretty heavyweight here (it prevents execution of *any* further > instructions until all preceeding stores have completed, as well as > ensuring completion of any ongoing cache flushes). In conjunction with the > store-release, that's going to hold everything up until the store-release > (and therefore any preceeding memory accesses) have completed. Granted, I > think that gives Paul his read/read ordering, but it's a lot heavier than > what's required. I do not believe that we need the trailing "dsb st". > > And notice how that requires you to mark the store itself. There is no > > actual barrier *after* the store that does the optimized model. > > > > Of course, it's entirely possible that it's not worth worrying about > > this on ARM64, and that just doing it as a "normal store followed by a > > full memory barrier" is good enough. But at least in *theory* a > > microarchitecture might make it much cheaper to do a "store with > > release consistency" followed by "write barrier". > > I agree with the sentiment but, given that this stuff is so heavily > microarchitecture-dependent (and not simple to probe), a simple dmb ish > might be the best option after all. That's especially true if the > microarchitecture decided to ignore the barrier options and treat everything > as `all accesses, full system' in order to keep the hardware design simple. I believe that we can do quite a bit better with current hardware instructions (in the case of ARM, for a recent definition of "current") and also simplify the memory ordering quite a bit. Thanx, Paul _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev