On 19/02/16 15:24, Kyrill Tkachov wrote: > Hi all, > > The atomic_loaddi expander on arm has some issues and can benefit from a > rewrite to properly > perform double-word atomic loads on various architecture levels. > > Consider the code: > ---------------------- > #include <stdatomic.h> > > atomic_ullong foo; > > int glob; > > int main(void) { > atomic_load_explicit(&foo, memory_order_acquire); > return glob; > } > --------------------- > > Compiled with -O2 -march=armv7-a -std=c11 this gives: > movw r3, #:lower16:glob > movt r3, #:upper16:glob > dmb ish > movw r2, #:lower16:foo > movt r2, #:upper16:foo > ldrexd r0, r1, [r2] > ldr r0, [r3] > bx lr > > For the acquire memory model the barrier should be after the ldrexd, not > before. > The same code is generated when compiled with -march=armv7ve. However, we can > get away with a single LDRD > on such systems. In issue C.c of The ARM Architecture Reference Manual for > ARMv7-A and ARMv7-R > recommends at chapter A3.5.3: > "In an implementation that includes the Large Physical Address Extension, > LDRD and STRD accesses to 64-bit aligned > locations are 64-bit single-copy atomic". > We still need the barrier after the LDRD to enforce the acquire ordering > semantics. > > For ARMv8-A we can do even better and use the load double-word acquire > instruction: LDAEXD, with no need for > a barrier afterwards. > > I've discussed the required sequences with some kernel folk and had a read > through: > https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html > and this is the patch I've come up with. > > This patch handles all three of the above cases by rewriting the > atomic_loaddi expander. > With this patch for the above code with -march=armv7-a we would now generate: > movw r3, #:lower16:foo > movt r3, #:upper16:foo > ldrexd r0, r1, [r3] > movw r3, #:lower16:glob > movt r3, #:upper16:glob > dmb ish > ldr r0, [r3] > bx lr > > For -march=armv7ve: > movw r3, #:lower16:foo > movt r3, #:upper16:foo > ldrd r2, r3, [r3] > movw r3, #:lower16:glob > movt r3, #:upper16:glob > dmb ish > ldr r0, [r3] > bx lr > > and for -march=armv8-a: > movw r3, #:lower16:foo > movt r3, #:upper16:foo > ldaexd r2, r3, [r3] > movw r3, #:lower16:glob > movt r3, #:upper16:glob > ldr r0, [r3] > bx lr > > For the relaxed memory model the armv7ve and armv8-a can be relaxed to a > single > LDRD instruction, without any barriers. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk?
This is OK and needed for the release branches. Thanks. regards Ramana > > Thanks, > Kyrill > > P.S. The backport to the previous branches will look a bit different because > the > ARM_FSET_HAS_CPU1 machinery in arm.h was introduced for GCC 6. I'll prepare a > backport > separately if this is accepted. > > 2016-02-19 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR target/69875 > * config/arm/arm.h (TARGET_HAVE_LPAE): Define. > * config/arm/unspecs.md (VUNSPEC_LDRD_ATOMIC): New value. > * config/arm/sync.md (arm_atomic_loaddi2_ldrd): New pattern. > (atomic_loaddi_1): Delete. > (atomic_loaddi): Rewrite expander using the above changes. > > 2016-02-19 Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > PR target/69875 > * gcc.target/arm/atomic_loaddi_acquire.x: New file. > * gcc.target/arm/atomic_loaddi_relaxed.x: Likewise. > * gcc.target/arm/atomic_loaddi_seq_cst.x: Likewise. > * gcc.target/arm/atomic_loaddi_1.c: New test. > * gcc.target/arm/atomic_loaddi_2.c: Likewise. > * gcc.target/arm/atomic_loaddi_3.c: Likewise. > * gcc.target/arm/atomic_loaddi_4.c: Likewise. > * gcc.target/arm/atomic_loaddi_5.c: Likewise. > * gcc.target/arm/atomic_loaddi_6.c: Likewise. > * gcc.target/arm/atomic_loaddi_7.c: Likewise. > * gcc.target/arm/atomic_loaddi_8.c: Likewise. > * gcc.target/arm/atomic_loaddi_9.c: Likewise.