https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
--- Comment #56 from mwahab at gcc dot gnu.org --- (In reply to James Greenhalgh from comment #55) > (In reply to torvald from comment #49) > > > This is the case of allowing non-DRF normal accesses. The *other* case I > > was thinking about is how the test would have to look like when *not* > > allowing them. One way to do it would be: > > > > thread_a { > > __sync_lock_test_and_set (foo, 1) > > __sync_lock_test_and_set (bar, 1) // or __sync_lock_release, or __sync RMW > > } > > [..] (it is > a shame that the DRF code you provided will suffer from an extra barrier if > Matthew/Andrew's work is applied, but I think this is a corner case which we > probably don't want to put too much thought in to working around). I'm not familiar with the code but I would have thought that it would be straightforward to optimize away the first dmb. It seems like it would be a simple peephole to spot a sequence of two atomic operations, both with __sync barriers, and replace the first with the equivalent __atomic barrier.