>I wonder if it's worth providing a set of "locked read" functions.
Most out-of-order machines include “read acquire” and “write release” which are pretty close to what you’re suggesting. With the current routines, we only have “read relaxed” and “write relaxed”. I think implementing acquire/release semantics is a very good idea, I would also like to clarify the properties of atomics. One very important question: Are atomics also volatile? If so, the compiler has very limited ability to move them around. If not, it is difficult to tell when or where they will take place unless the surrounding code is peppered with barriers.