Sorry for the long mail and for what's probably an FAQ.  I did try to find
an answer without bothering the list... (and showing my ignorance so much :-))

At the moment, the s390 backend treats all atomic loads as simple loads
and only uses serialisation instructions for atomic stores.  I just wanted
to check whether this was really the right behaviour.

The architecture has strong memory-ordering semantics in which a CPU is
only allowed to move a store after a later load; the other three combinations
cannot happen.  The current implementation seems fine from that point of view,
because it means that a serialising instruction after a store is enough
to prevent any reordering.  However, page 5-126 of the architecture
manual[*] says:

  Following is an example showing the effects of serialization. Location
  A initially contains FF hex.

  CPU 1                  CPU 2
  MVI A,X'00'       G    CLI A,X'00'
  BCR 15,0               BNE G

  The BCR 15,0 instruction executed by CPU 1 is a serializing
  instruction that ensures that the store by CPU 1 at location A is
  completed. However, CPU 2 may loop indefinitely, or until the next
  interruption on CPU 2, because CPU 2 may already have fetched from
  location A for every execution of the CLI instruction. A serializing
  instruction must be in the CPU-2 loop to ensure that CPU 2 will again
  fetch from location A.

Does the new C/C++ memory model allow that kind of infinite loop 
even for sequentially-consistent atomic loads?  The draft text was:

[29.3.3]
  There shall be a single total order S on all memory_order_seq_cst
  operations, consistent with the “happens before” order and modification
  orders for all affected locations, such that each memory_order_seq_cst
  operation that loads a value observes either the last preceding
  modification according to this order S, or the result of an operation
  that is not memory_order_seq_cst.

but when I asked around, noone could see anything in the standard that
prevents the total order from having an infinite sequence of loads
between two stores.  That feels like a cheat though. :-)

Even if it isn't allowed, every CPU is going to get interrupted eventually,
and I'm told that in practice all current implementations would see the
store at some point.  In that case it might come down to a quality of
implementation question.  Is it OK to leave out the serialisation anyway
with a slightly vague guarantee like that?

Thanks,
Richard

[*] Available here FWIW: 
http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a

Reply via email to