Re: should sync builtins be full optimization barriers?

Geert Bosch Sun, 11 Sep 2011 12:00:31 -0700

On Sep 11, 2011, at 10:12, Andrew MacLeod wrote:

>> To be honest, I can't quite see the use of completely unordered
>> atomic operations, where we not even prohibit compiler optimizations.
>> It would seem if we guarantee that a variable will not be accessed
>> concurrently from any other thread, we wouldn't need the operation
>> to be atomic in the first place. That said, it's quite likely I'm
>> missing something here.
>> 
> there is no guarantee it isnt being accessed concurrently,  we are only 
> guaranteeing that if it is accessed from another thread, it wont be a 
> partially written value...  if you read a 64 bit value on a 32 bit machine, 
> you need to guarantee that both halves are fully written before any read can 
> happen. Thats the bare minimum guarantee of an atomic.


OK, I now see (in §1.10(5) of the n3225 draft) that “relaxed” atomic operations 
are not synchronization operations even though, like synchronization 
operations, they cannot contribute to data races. 

However the next paragraph says: 
All modifications to a particular atomic object M occur in some particular 
total order, called the modification order of M. [...] There is a separate 
order for each atomic object. There is no requirement that these can be 
combined into a single total order for all objects. In general this will be 
impossible since different threads may observe modifications to different 
objects in inconsistent orders.

So, if I understand correctly, then operations using relaxed memory order will 
still need fences, but indeed do not require any optimization barrier. For 
memory_order_seq_cst we'll need a full barrier, and for the others there is a 
partial barrier.

Also, for relaxed order atomic operations we would only need a single fence 
between two accesses (by a thread) to the same atomic object. 
> 
>> For Ada, all atomic accesses are always memory_order_seq_cst, and we
>> just care about being able to optimize accesses if we know they'll be
>> done from the same processor. For the C++11 model, thinking about
>> the semantics of any memory orders other than memory_order_seq_cst
>> and their interaction with operations with different ordering semantics
>> makes my head hurt.
> I had many headaches over a long period wrapping my head around it, but 
> ultimately it maps pretty closely to various hardware implementations. Best 
> bet?  just use seq-cst until you discover you have a  performance problem!!  
> I expect thats why its the default :-)

We've already discovered that. Atomic types are used quite a bit in Ada code. 
Unfortunately, many of the uses are just for accesses to memory-mapped I/O 
devices, single write. On many systems I/O locations can't be used for 
synchronization anyway, and only regular cacheable memory can be used for that.

For such operations you don't want the compiler to reorder accesses to 
different I/O locations, but mutual exclusion wrt. other threads is already 
taken care of. It seems this is precisely the opposite from what the relaxed 
memory order provides.

Regards,
  -Geert

Re: should sync builtins be full optimization barriers?

Reply via email to