> Whatever the name is, do you contend that store-before-load/store is > _useful_? Can you show why? And, can you show an architecture where > it's actually cheaper than membar_sync?
The first two are good questions. (I don't have an answer to them, but I haven't had much to say on this thread, either.) But the third...that looks to me as though there's an implicit "no? then let's just use membar_sync for those uses". And that, I maintain, is a terrible idea: conflating two conceptually distinct operations just because the one is, at present, most efficiently implemented as the other - that leads to code that doesn't draw the distinction and ends up crippled, in at least performance and possibly functionality, when a use case shows up for which those are _not_ true. Until and unless there are use cases for store-before-load/store, this is of no immediate importance. But, if/when there are, I would very much prefer to see them represented as distinct operations, even if they are implemented the same way on all current ports. Of course, it's possible that's not where you were (possibly not consciously) going with that, in which case the above is moot, or at least close to it. > I would rather avoid introducing a proliferation of membar names, > because the more there are, the more confusing the choice is. True, but I think the code - and the coders - would benefit from thinking enough to get it right rather than just shrugging and resorting to a sledgehammer in all cases. Consider where we'd be today if everyone had used a full memory barrier in all cases: all uses would need inspecting to see what kind of barrier they actually need...or forget it and eat the performance cost of an often much stricter barrier than necessary. > Having nicely paired names helps: if you see `membar_exit', that's a > hint you should see a corresponding `membar_enter' -- and if you > don't, that should raise alarm bells in your head. True, but I also think that having names like that assumes things about their uses that are not necessarily valid. At a past job, I had occasion to write code that called for memory barriers - and I found myself resorting to looking at what the hardware in question provided and choosing from those, because none of the higher-level calls available matched my use case well enough for me to be confident they were right. (It was part of a turnkey system, so assuming what the underlying hardware was was more reasonable than it is for NetBSD.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B