Re: should sync builtins be full optimization barriers?

Geert Bosch Tue, 13 Sep 2011 07:58:36 -0700

On Sep 13, 2011, at 08:08, Andrew MacLeod wrote:

> On 09/12/2011 09:52 PM, Geert Bosch wrote:
>> No that's false. Even on systems with nice memory models, such as x86 and 
>> SPARC with a TSO model, you need a fence to avoid that a write-load of the 
>> same location is forced to
Note that here with write-load I meant a write instruction *and* a subsequent 
load instruction.
>>  make it all the way to coherent memory and not forwarded directly from the 
>> write buffer or L1 cache. The reasons that fences are expensive is exactly 
>> that it requires system-wide agreement.
> 
> On x86, all the atomic operations are prefixed with LOCK which is suppose to 
> grant them exclusive use of shared memory. Ken's comments would appear to 
> indicate that imposes a total order across all processors.
Yes, that's right. All atomic read-modify-write operations have an implicit 
full barrier on x86 and on SPARC. However, my example was about regular stores 
and loads from an atomic int using the C++ relaxed memory model. Indeed, just 
using XCHG (or SWAP on SPARC) instructions for writes and regular loads for 
reads is sufficient to establish a total order.


These are expensive synchronizing instructions though, with full barrier 
semantics. For the relaxed memory model, the compiler would be able to optimize 
away redundant loads and stores, as you indicated before.

> I presume other architectures have similar mechanisms if they support atomic 
> operations.  You have to have *some* way of having 2 threads which 
> simultaneous perform read/modify/write atomic instructions work properly...
Yes, read-modify-write instructions also function as full barrier.
> 
> Assume x=0, and 2 threads both execute a single atomic increment operation:
>  { read x, add 1, write result back to x }
> When both threads have finished, the result *has* to be x == 2.  So the 2 
> threads must be able to see some sort of coherent value for x.
Indeed. The trouble is with regular reads and writes.
> 
> If coherency is provided for read/modify/write, it should also be available 
> for read or write as well...


No, unless you replace writes by read-modify-write instructions, or you insert 
additional fences. Regular writes are buffered, and initially only visible to 
the processor itself. The reason regular writes to memory are so fast is that 
the processor doesn't have to wait for the write to percolate down the memory 
hierarchy, but can continue processing using *its* last written value.

  -Geert

Re: should sync builtins be full optimization barriers?

Reply via email to