http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50065
--- Comment #7 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-08-14 13:00:06 UTC --- > I don't think this is an valid optimization. > > There are only two memory models in SPARC32, TSO and PSO (not RMO in the > 64-bit > v9). Both don't allow relaxing the read->write order, i.e. 'LD remap_barrier' > should always be executed before 'ST lock'. > > This optimization violates the memory model, therefore should be prohibited. You're apparently confusing 2 different concepts: 1. What an optimizing C compiler is permitted to do. This is defined by the ISO Standard in terms of an abstract machine that is somewhat simplistic. In particular, there is no concept of concurrency or memory model, and the whole thing is essentially target-independent. The kind of reordering we have here is allowed by the Standard as it doesn't change the "external state" of the abstract machine. 2. The memory model implemented by the SPARC processor, under which loads and stores can be reordered, even though the compiler itself doesn't reorder them. A proper implementation of spinlocks needs to take them both into account. For the first part, you need a compiler memory barrier, i.e.: __asm__ __volatile__ ("" : : : "memory"); For the second part, you need a processor memory barrier, i.e. to put a stbar instruction if you're running PSO, plus an atomic instruction that is the only memory barrier available in V8 for TSO.