[Bug target/65697] atomic memory barriers not strong enough for sync builtins

jgreenhalgh at gcc dot gnu.org Mon, 11 May 2015 06:45:04 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697


--- Comment #55 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> ---
(In reply to torvald from comment #49)
> > bar = 0, foo = 0;
> > 
> > thread_a {
> >   __sync_lock_test_and_set (foo, 1)
> >   bar = 1
> > }
> > 
> > thread_b {
> >   /* If we can see the write to bar, the write
> >      to foo must also have happened.  */
> >   if (bar) /* Reads 1.  */
> >    assert (foo) /* Should never read 0.  */
> > }
> 
> This is the case of allowing non-DRF normal accesses.  The *other* case I
> was thinking about is how the test would have to look like when *not*
> allowing them.  One way to do it would be:
> 
> thread_a {
>   __sync_lock_test_and_set (foo, 1)
>   __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW
> }
> 
> thread_b {
>   if (__sync_fetch_and_add (bar, 0))
>     assert (foo)  // DRF if thread_a's write is the final one
> }
> 
> In this case, would the current ARM implementation still produce
> insufficient code?  If not, at least in this test case, we could argue that
> there's nothing wrong with what ARM does.  (The question whether we wan't to
> require DRF strictly for __sync usage is of course still open.)

In this case, the current implementation would be fine. Thread A looks like
this:

thread_a:
        adrp    x0, foo
        mov     w1, 1
        ldr     x0, [x0, #:lo12:foo]
.L2:
        ldaxr   w2, [x0] /* Load acquire foo.  */
        stxr    w3, w1, [x0] /* Store release foo.  */
        cbnz    w3, .L2 /* Branch if not exclusive access.  */
        adrp    x0, bar
        ldr     x0, [x0, #:lo12:bar]
.L3:
        ldaxr   w2, [x0] /* Load acquire bar.  */
        stxr    w3, w1, [x0] /* Store release bar.  */
        cbnz    w3, .L3
        ret

And the architecture gives a specific requirement on the ordering of
store-release and load-acquire:

A Store-Release followed by a Load-Acquire is observed in program order by any
observers that are in both:
  — The shareability domain of the address accessed by the Store-Release.
  — The shareability domain of the address accessed by the Load-Acquire.

So yes, I think in this case we could argue that there is nothing wrong with
what ARM does, however I would expect the non-DRF code to be much more common
in the wild, so I think we still need to deal with this issue. (it is a shame
that the DRF code you provided will suffer from an extra barrier if
Matthew/Andrew's work is applied, but I think this is a corner case which we
probably don't want to put too much thought in to working around).

[Bug target/65697] __atomic memory barriers not strong enough for __sync builtins

Reply via email to

[Bug target/65697] atomic memory barriers not strong enough for sync builtins