On Mon, 2013-08-05 at 15:03 +0100, Richard Sandiford wrote:
> Sorry for the long mail and for what's probably an FAQ.  I did try to find
> an answer without bothering the list... (and showing my ignorance so much :-))
> 
> At the moment, the s390 backend treats all atomic loads as simple loads
> and only uses serialisation instructions for atomic stores.  I just wanted
> to check whether this was really the right behaviour.
> 
> The architecture has strong memory-ordering semantics in which a CPU is
> only allowed to move a store after a later load; the other three combinations
> cannot happen.  The current implementation seems fine from that point of view,
> because it means that a serialising instruction after a store is enough
> to prevent any reordering.  However, page 5-126 of the architecture
> manual[*] says:
> 
>   Following is an example showing the effects of serialization. Location
>   A initially contains FF hex.
> 
>   CPU 1                  CPU 2
>   MVI A,X'00'       G    CLI A,X'00'
>   BCR 15,0               BNE G
> 
>   The BCR 15,0 instruction executed by CPU 1 is a serializing
>   instruction that ensures that the store by CPU 1 at location A is
>   completed. However, CPU 2 may loop indefinitely, or until the next
>   interruption on CPU 2, because CPU 2 may already have fetched from
>   location A for every execution of the CLI instruction. A serializing
>   instruction must be in the CPU-2 loop to ensure that CPU 2 will again
>   fetch from location A.
> 
> Does the new C/C++ memory model allow that kind of infinite loop 
> even for sequentially-consistent atomic loads?  The draft text was:
> 
> [29.3.3]
>   There shall be a single total order S on all memory_order_seq_cst
>   operations, consistent with the “happens before” order and modification
>   orders for all affected locations, such that each memory_order_seq_cst
>   operation that loads a value observes either the last preceding
>   modification according to this order S, or the result of an operation
>   that is not memory_order_seq_cst.
> 
> but when I asked around, noone could see anything in the standard that
> prevents the total order from having an infinite sequence of loads
> between two stores.  That feels like a cheat though. :-)

It would be a correct execution in terms of the allowed orderings, I
think.  What you're asking about is a forward progress guarantee, and
1.10.25 (in C++ N3690) states:

  An implementation should ensure that the last value (in modification
  order) assigned by an atomic or synchronization operation will become
  visible to all other threads in a finite period of time.

Which to me states that busy-waiting loops like in the example above
should eventually stop looping.  The latency of this would be a QoI
issue I guess.

> Even if it isn't allowed, every CPU is going to get interrupted eventually,
> and I'm told that in practice all current implementations would see the
> store at some point.  In that case it might come down to a quality of
> implementation question.  Is it OK to leave out the serialisation anyway
> with a slightly vague guarantee like that?

That's a good question.  If every thread is interrupted eventually, then
it would be correct without adding a serializing instruction to each
load.  If not, we'd either have to add one on each atomic load, which
might be expensive (how expensive would it be on s390?); or we'd need to
try to just add it to loops that look like a busy waiting loop (ie, a
loop whose termination might depend on an atomic load).

I'm not aware of other architectures where we'd need to be concerned
about atomic loads not picking up the most recent value eventually --
does anybody know about others or is this just a problem on s390?

Torvald

Reply via email to