Victor Do Nascimento <victor.donascime...@arm.com> writes:
> The armv9.4-a architectural revision adds three new atomic operations
> associated with the LSE128 feature:
>
>   * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
>   value held in a pair of registers, with original data loaded into
>   the same 2 registers.
>   * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
>   in a pair of registers, with original data loaded into the same 2
>   registers.
>   * SWPP - Atomic swap of one 128-bit value with 128-bit value held
>   in a pair of registers.
>
> This patch adds the logic required to make use of these when the
> architectural feature is present and a suitable assembler available.
>
> In order to do this, the following changes are made:
>
>   1. Add a configure-time check to check for LSE128 support in the
>   assembler.
>   2. Edit host-config.h so that when N == 16, nifunc = 2.
>   3. Where available due to LSE128, implement the second ifunc, making
>   use of the novel instructions.
>   4. For atomic functions unable to make use of these new
>   instructions, define a new alias which causes the _i1 function
>   variant to point ahead to the corresponding _i2 implementation.
>
> libatomic/ChangeLog:
>
>       * Makefile.am (AM_CPPFLAGS): add conditional setting of
>       -DHAVE_FEAT_LSE128.
>       * acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New.
>       * config/linux/aarch64/atomic_16.S (LSE128): New macro
>       definition.
>       (libat_exchange_16): New LSE128 variant.
>       (libat_fetch_or_16): Likewise.
>       (libat_or_fetch_16): Likewise.
>       (libat_fetch_and_16): Likewise.
>       (libat_and_fetch_16): Likewise.
>       * config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
>       (IFUNC_NCOND): Add operand size checking.
>       (has_lse2): Renamed from `ifunc1`.
>       (has_lse128): New.
>       (HAS_LSE128): Likewise.
>       * libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128.
>       * configure (ac_subst_vars): Regenerated via autoreconf.
>       * libatomic/Makefile.in: Likewise.
>       * libatomic/auto-config.h.in: Likewise.
> ---
>  libatomic/Makefile.am                        |   3 +
>  libatomic/Makefile.in                        |   1 +
>  libatomic/acinclude.m4                       |  19 +++
>  libatomic/auto-config.h.in                   |   3 +
>  libatomic/config/linux/aarch64/atomic_16.S   | 170 ++++++++++++++++++-
>  libatomic/config/linux/aarch64/host-config.h |  29 +++-
>  libatomic/configure                          |  59 ++++++-
>  libatomic/configure.ac                       |   1 +
>  8 files changed, 276 insertions(+), 9 deletions(-)
>
> [...]
> diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
> index f35ab5b60a5..4197db8f404 100644
> --- a/libatomic/acinclude.m4
> +++ b/libatomic/acinclude.m4
> @@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
>    ])
>  ])
>  
> +dnl
> +dnl Test if the host assembler supports armv9.4-a LSE128 isns.
> +dnl
> +AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[
> +  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
> +    [libat_cv_have_feat_lse128],[
> +    AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
> +    if AC_TRY_EVAL(ac_link); then

ac_compile should be enough for this.  The link step isn't really
adding anything.

> +      eval libat_cv_have_feat_lse128=yes
> +    else
> +      eval libat_cv_have_feat_lse128=no
> +    fi
> +    rm -f conftest*
> +  ])
> +  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
> +     [Have LSE128 support for 16 byte integers.])
> +  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test 
> x$libat_cv_have_feat_lse128 = xyes])
> +])
> +
>  dnl
>  dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
>  dnl
> [...]
> @@ -206,6 +211,31 @@ ENTRY (libat_exchange_16)
>  END (libat_exchange_16)
>  
>  
> +#if HAVE_FEAT_LSE128
> +ENTRY_FEAT (libat_exchange_16, LSE128)
> +     mov     tmp0, x0
> +     mov     res0, in0
> +     mov     res1, in1
> +     cbnz    w4, 1f
> +
> +     /* RELAXED.  */
> +     swpp    res0, res1, [tmp0]
> +     ret
> +1:
> +     cmp     w4, ACQUIRE
> +     b.hi    2f
> +
> +     /* ACQUIRE/CONSUME.  */
> +     swppa   res0, res1, [tmp0]
> +     ret
> +
> +     /* RELEASE/ACQ_REL/SEQ_CST.  */
> +2:   swppal  res0, res1, [tmp0]
> +     ret
> +END_FEAT (libat_exchange_16, LSE128)
> +#endif

Is there no benefit to using SWPPL for RELEASE here?  Similarly for the
others.

Looks good otherwise.

Thanks,
Richard

Reply via email to