https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

--- Comment #49 from torvald at gcc dot gnu.org ---
(In reply to James Greenhalgh from comment #43)
> (In reply to torvald from comment #37)
> > (In reply to James Greenhalgh from comment #35)
> > > So by the strict letter of the specification, no memory references to
> > > visible data should be allowed to move from after the entire body of the
> > > intrinsic to before it. That is to say in:
> > > 
> > >   __sync_lock_test_and_set (foo, 1)
> > >   bar = 1
> > > 
> > > an observer should not be able to observe the write to bar before the 
> > > write
> > > to foo. This is a difference from the C++11 semantics.
> > 
> > Can you clarify how this observer would look like?  I think we should look
> > at both code examples that just use __sync for concurrent accesses, and
> > examples that also use normal memory accesses as if those would be
> > guaranteed to be atomic.  None of the __sync docs nor the psABI guarantee
> > any of the latter AFAIK.  I don't think it would be unreasonable to argue
> > that __sync doesn't make promises about non-DRF normal accesses, and so,
> > strictly speaking, maybe programs can't in fact rely on any of the behaviors
> > that are complicated to implement for ARM.  That's why I'd like to
> > distinguish those two cases.
> 
> Sure, it would look much like your cppmem example above, though you can add
> some
> barriers to the observer if you want to make a stronger example
> 
> bar = 0, foo = 0;
> 
> thread_a {
>   __sync_lock_test_and_set (foo, 1)
>   bar = 1
> }
> 
> thread_b {
>   /* If we can see the write to bar, the write
>      to foo must also have happened.  */
>   if (bar) /* Reads 1.  */
>    assert (foo) /* Should never read 0.  */
> }

This is the case of allowing non-DRF normal accesses.  The *other* case I was
thinking about is how the test would have to look like when *not* allowing
them.  One way to do it would be:

thread_a {
  __sync_lock_test_and_set (foo, 1)
  __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW
}

thread_b {
  if (__sync_fetch_and_add (bar, 0))
    assert (foo)  // DRF if thread_a's write is the final one
}

In this case, would the current ARM implementation still produce insufficient
code?  If not, at least in this test case, we could argue that there's nothing
wrong with what ARM does.  (The question whether we wan't to require DRF
strictly for __sync usage is of course still open.)

> > > I'm not worried about __sync_lock_release, I think the documentation is
> > > strong enough and unambiguous.
> > 
> > Are you aware that the GCC's __sync disallow store-store reordering across
> > __sync_lock_release, whereas the psABI docs don't?
> 
> No I was not, and even looking exactly for the text you were referring to,
> it took
> me three attempts to spot it. Yes, I understand now why you are concerned
> about
> the GCC wording. Perhaps this is just an artefact of a mistake transcribing
> the psABI?

I suppose so, but can't say for sure.  Somebody might have had x86 TSO in mind
when writing that, and perhaps not just accidentally.

However, we say psABI is the reference spec, so I agree we should change the
GCC __sync_lock_release docs accordingly.

Reply via email to