[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel CPUs with AVX

cvs-commit at gcc dot gnu.org via Gcc-bugs Mon, 28 Mar 2022 22:59:29 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688


--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
<ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:1861b9a9f13c64333306a2eb146b2da0a41d044f

commit r11-9729-g1861b9a9f13c64333306a2eb146b2da0a41d044f
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Thu Mar 17 18:49:00 2022 +0100

    libatomic: Improve 16-byte atomics on Intel AVX [PR104688]

    As mentioned in the PR, the latest Intel SDM has added:
    "Processors that enumerate support for Intel® AVX (by setting the feature
flag CPUID.01H:ECX.AVX[bit 28])
    guarantee that the 16-byte memory operations performed by the following
instructions will always be
    carried out atomically:
    â¢ MOVAPD, MOVAPS, and MOVDQA.
    â¢ VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
    â¢ VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with EVEX.128
and k0 (masking disabled).
    (Note that these instructions require the linear addresses of their memory
operands to be 16-byte
    aligned.)"

    The following patch deals with it just on the libatomic library side so
far,
    currently (since ~ 2017) we emit all the __atomic_* 16-byte builtins as
    library calls since and this is something that we can hopefully backport.

    The patch simply introduces yet another ifunc variant that takes priority
    over the pure CMPXCHG16B one, one that checks AVX and CMPXCHG16B bits and
    on non-Intel clears the AVX bit during detection for now (if AMD comes
    with the same guarantee, we could revert the config/x86/init.c hunk),
    which implements 16-byte atomic load as vmovdqa and 16-byte atomic store
    as vmovdqa followed by mfence.

    2022-03-17  Jakub Jelinek  <ja...@redhat.com>

            PR target/104688
            * Makefile.am (IFUNC_OPTIONS): Change on x86_64 to -mcx16 -mcx16.
            (libatomic_la_LIBADD): Add $(addsuffix _16_2_.lo,$(SIZEOBJS)) for
            x86_64.
            * Makefile.in: Regenerated.
            * config/x86/host-config.h (IFUNC_COND_1): For x86_64 define to
            both AVX and CMPXCHG16B bits.
            (IFUNC_COND_2): Define.
            (IFUNC_NCOND): For x86_64 define to 2 * (N == 16).
            (MAYBE_HAVE_ATOMIC_CAS_16, MAYBE_HAVE_ATOMIC_EXCHANGE_16,
            MAYBE_HAVE_ATOMIC_LDST_16): Define to IFUNC_COND_2 rather than
            IFUNC_COND_1.
            (HAVE_ATOMIC_CAS_16): Redefine to 1 whenever IFUNC_ALT != 0.
            (HAVE_ATOMIC_LDST_16): Redefine to 1 whenever IFUNC_ALT == 1.
            (atomic_compare_exchange_n): Define whenever IFUNC_ALT != 0
            on x86_64 for N == 16.
            (__atomic_load_n, __atomic_store_n): Redefine whenever IFUNC_ALT ==
1
            on x86_64 for N == 16.
            (atomic_load_n, atomic_store_n): New functions.
            * config/x86/init.c (__libat_feat1_init): On x86_64 clear bit_AVX
            if CPU vendor is not Intel.

    (cherry picked from commit 1d47c0512a265d4bb3ab9e56259fd1e4f4d42c75)

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel CPUs with AVX

Reply via email to