On 19/02/16 15:24, Kyrill Tkachov wrote:
> Hi all,
> 
> The atomic_loaddi expander on arm has some issues and can benefit from a 
> rewrite to properly
> perform double-word atomic loads on various architecture levels.
> 
> Consider the code:
> ----------------------
> #include <stdatomic.h>
> 
> atomic_ullong foo;
> 
> int glob;
> 
> int main(void) {
>     atomic_load_explicit(&foo, memory_order_acquire);
>     return glob;
> }
> ---------------------
> 
> Compiled with -O2 -march=armv7-a -std=c11 this gives:
>         movw    r3, #:lower16:glob
>         movt    r3, #:upper16:glob
>         dmb     ish
>         movw    r2, #:lower16:foo
>         movt    r2, #:upper16:foo
>         ldrexd  r0, r1, [r2]
>         ldr     r0, [r3]
>         bx      lr
> 
> For the acquire memory model the barrier should be after the ldrexd, not 
> before.
> The same code is generated when compiled with -march=armv7ve. However, we can 
> get away with a single LDRD
> on such systems. In issue C.c of The ARM Architecture Reference Manual for 
> ARMv7-A and ARMv7-R
> recommends at chapter A3.5.3:
> "In an implementation that includes the Large Physical Address Extension, 
> LDRD and STRD accesses to 64-bit aligned
> locations are 64-bit single-copy atomic".
> We still need the barrier after the LDRD to enforce the acquire ordering 
> semantics.
> 
> For ARMv8-A we can do even better and use the load double-word acquire 
> instruction: LDAEXD, with no need for
> a barrier afterwards.
> 
> I've discussed the required sequences with some kernel folk and had a read 
> through:
> https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
> and this is the patch I've come up with.
> 
> This patch handles all three of the above cases by rewriting the 
> atomic_loaddi expander.
> With this patch for the above code with -march=armv7-a we would now generate:
>         movw    r3, #:lower16:foo
>         movt    r3, #:upper16:foo
>         ldrexd  r0, r1, [r3]
>         movw    r3, #:lower16:glob
>         movt    r3, #:upper16:glob
>         dmb     ish
>         ldr     r0, [r3]
>         bx      lr
> 
> For -march=armv7ve:
>         movw    r3, #:lower16:foo
>         movt    r3, #:upper16:foo
>         ldrd    r2, r3, [r3]
>         movw    r3, #:lower16:glob
>         movt    r3, #:upper16:glob
>         dmb     ish
>         ldr     r0, [r3]
>         bx      lr
> 
> and for -march=armv8-a:
>         movw    r3, #:lower16:foo
>         movt    r3, #:upper16:foo
>         ldaexd  r2, r3, [r3]
>         movw    r3, #:lower16:glob
>         movt    r3, #:upper16:glob
>         ldr     r0, [r3]
>         bx      lr
> 
> For the relaxed memory model the armv7ve and armv8-a can be relaxed to a 
> single
> LDRD instruction, without any barriers.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?


This is OK and needed for the release branches. Thanks.

regards
Ramana
> 
> Thanks,
> Kyrill
> 
> P.S. The backport to the previous branches will look a bit different because 
> the
> ARM_FSET_HAS_CPU1 machinery in arm.h was introduced for GCC 6. I'll prepare a 
> backport
> separately if this is accepted.
> 
> 2016-02-19  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> 
>     PR target/69875
>     * config/arm/arm.h (TARGET_HAVE_LPAE): Define.
>     * config/arm/unspecs.md (VUNSPEC_LDRD_ATOMIC): New value.
>     * config/arm/sync.md (arm_atomic_loaddi2_ldrd): New pattern.
>     (atomic_loaddi_1): Delete.
>     (atomic_loaddi): Rewrite expander using the above changes.
> 
> 2016-02-19  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
> 
>     PR target/69875
>     * gcc.target/arm/atomic_loaddi_acquire.x: New file.
>     * gcc.target/arm/atomic_loaddi_relaxed.x: Likewise.
>     * gcc.target/arm/atomic_loaddi_seq_cst.x: Likewise.
>     * gcc.target/arm/atomic_loaddi_1.c: New test.
>     * gcc.target/arm/atomic_loaddi_2.c: Likewise.
>     * gcc.target/arm/atomic_loaddi_3.c: Likewise.
>     * gcc.target/arm/atomic_loaddi_4.c: Likewise.
>     * gcc.target/arm/atomic_loaddi_5.c: Likewise.
>     * gcc.target/arm/atomic_loaddi_6.c: Likewise.
>     * gcc.target/arm/atomic_loaddi_7.c: Likewise.
>     * gcc.target/arm/atomic_loaddi_8.c: Likewise.
>     * gcc.target/arm/atomic_loaddi_9.c: Likewise.

Reply via email to