https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
--- Comment #55 from James Greenhalgh <jgreenhalgh at gcc dot gnu.org> --- (In reply to torvald from comment #49) > > bar = 0, foo = 0; > > > > thread_a { > > __sync_lock_test_and_set (foo, 1) > > bar = 1 > > } > > > > thread_b { > > /* If we can see the write to bar, the write > > to foo must also have happened. */ > > if (bar) /* Reads 1. */ > > assert (foo) /* Should never read 0. */ > > } > > This is the case of allowing non-DRF normal accesses. The *other* case I > was thinking about is how the test would have to look like when *not* > allowing them. One way to do it would be: > > thread_a { > __sync_lock_test_and_set (foo, 1) > __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW > } > > thread_b { > if (__sync_fetch_and_add (bar, 0)) > assert (foo) // DRF if thread_a's write is the final one > } > > In this case, would the current ARM implementation still produce > insufficient code? If not, at least in this test case, we could argue that > there's nothing wrong with what ARM does. (The question whether we wan't to > require DRF strictly for __sync usage is of course still open.) In this case, the current implementation would be fine. Thread A looks like this: thread_a: adrp x0, foo mov w1, 1 ldr x0, [x0, #:lo12:foo] .L2: ldaxr w2, [x0] /* Load acquire foo. */ stxr w3, w1, [x0] /* Store release foo. */ cbnz w3, .L2 /* Branch if not exclusive access. */ adrp x0, bar ldr x0, [x0, #:lo12:bar] .L3: ldaxr w2, [x0] /* Load acquire bar. */ stxr w3, w1, [x0] /* Store release bar. */ cbnz w3, .L3 ret And the architecture gives a specific requirement on the ordering of store-release and load-acquire: A Store-Release followed by a Load-Acquire is observed in program order by any observers that are in both: — The shareability domain of the address accessed by the Store-Release. — The shareability domain of the address accessed by the Load-Acquire. So yes, I think in this case we could argue that there is nothing wrong with what ARM does, however I would expect the non-DRF code to be much more common in the wild, so I think we still need to deal with this issue. (it is a shame that the DRF code you provided will suffer from an extra barrier if Matthew/Andrew's work is applied, but I think this is a corner case which we probably don't want to put too much thought in to working around).