On 1/5/24 11:47, Richard Sandiford wrote:
Victor Do Nascimento <victor.donascime...@arm.com> writes:
The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

   * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
   value held in a pair of registers, with original data loaded into
   the same 2 registers.
   * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
   in a pair of registers, with original data loaded into the same 2
   registers.
   * SWPP - Atomic swap of one 128-bit value with 128-bit value held
   in a pair of registers.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

   1. Add a configure-time check to check for LSE128 support in the
   assembler.
   2. Edit host-config.h so that when N == 16, nifunc = 2.
   3. Where available due to LSE128, implement the second ifunc, making
   use of the novel instructions.
   4. For atomic functions unable to make use of these new
   instructions, define a new alias which causes the _i1 function
   variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

        * Makefile.am (AM_CPPFLAGS): add conditional setting of
        -DHAVE_FEAT_LSE128.
        * acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New.
        * config/linux/aarch64/atomic_16.S (LSE128): New macro
        definition.
        (libat_exchange_16): New LSE128 variant.
        (libat_fetch_or_16): Likewise.
        (libat_or_fetch_16): Likewise.
        (libat_fetch_and_16): Likewise.
        (libat_and_fetch_16): Likewise.
        * config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
        (IFUNC_NCOND): Add operand size checking.
        (has_lse2): Renamed from `ifunc1`.
        (has_lse128): New.
        (HAS_LSE128): Likewise.
        * libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128.
        * configure (ac_subst_vars): Regenerated via autoreconf.
        * libatomic/Makefile.in: Likewise.
        * libatomic/auto-config.h.in: Likewise.
---
  libatomic/Makefile.am                        |   3 +
  libatomic/Makefile.in                        |   1 +
  libatomic/acinclude.m4                       |  19 +++
  libatomic/auto-config.h.in                   |   3 +
  libatomic/config/linux/aarch64/atomic_16.S   | 170 ++++++++++++++++++-
  libatomic/config/linux/aarch64/host-config.h |  29 +++-
  libatomic/configure                          |  59 ++++++-
  libatomic/configure.ac                       |   1 +
  8 files changed, 276 insertions(+), 9 deletions(-)

[...]
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..4197db8f404 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
    ])
  ])
+dnl
+dnl Test if the host assembler supports armv9.4-a LSE128 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[
+  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
+    [libat_cv_have_feat_lse128],[
+    AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
+    if AC_TRY_EVAL(ac_link); then

ac_compile should be enough for this.  The link step isn't really
adding anything.

+      eval libat_cv_have_feat_lse128=yes
+    else
+      eval libat_cv_have_feat_lse128=no
+    fi
+    rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
+       [Have LSE128 support for 16 byte integers.])
+  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 
= xyes])
+])
+
  dnl
  dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
  dnl
[...]
@@ -206,6 +211,31 @@ ENTRY (libat_exchange_16)
  END (libat_exchange_16)
+#if HAVE_FEAT_LSE128
+ENTRY_FEAT (libat_exchange_16, LSE128)
+       mov     tmp0, x0
+       mov     res0, in0
+       mov     res1, in1
+       cbnz    w4, 1f
+
+       /* RELAXED.  */
+       swpp    res0, res1, [tmp0]
+       ret
+1:
+       cmp     w4, ACQUIRE
+       b.hi    2f
+
+       /* ACQUIRE/CONSUME.  */
+       swppa   res0, res1, [tmp0]
+       ret
+
+       /* RELEASE/ACQ_REL/SEQ_CST.  */
+2:     swppal  res0, res1, [tmp0]
+       ret
+END_FEAT (libat_exchange_16, LSE128)
+#endif

Is there no benefit to using SWPPL for RELEASE here?  Similarly for the
others.

We started off implementing all possible memory orderings available. Wilco saw value in merging less restricted orderings into more restricted ones - mainly to reduce codesize in less frequently used atomics.

This saw us combine RELEASE and ACQ_REL/SEQ_CST cases to make functions a little smaller.


Looks good otherwise.

Thanks,
Richard

Reply via email to