On 09/12/2011 09:52 PM, Geert Bosch wrote:
No that's false. Even on systems with nice memory models, such as x86 and SPARC
with a TSO model, you need a fence to avoid that a write-load of the same
location is forced to make it all the way to coherent memory and not forwarded
directly from the write buffer or L1 cache. The reasons that fences are
expensive is exactly that it requires system-wide agreement.
On x86, all the atomic operations are prefixed with LOCK which is
suppose to grant them exclusive use of shared memory. Ken's comments
would appear to indicate that imposes a total order across all processors.
I presume other architectures have similar mechanisms if they support
atomic operations. You have to have *some* way of having 2 threads
which simultaneous perform read/modify/write atomic instructions work
properly...
Assume x=0, and 2 threads both execute a single atomic increment operation:
{ read x, add 1, write result back to x }
When both threads have finished, the result *has* to be x == 2. So the
2 threads must be able to see some sort of coherent value for x.
If coherency is provided for read/modify/write, it should also be
available for read or write as well...
Andrew