Ah whoops- I didn't see this before sending off V4 just now, my apologies.
I'll try my best to get this implemented before the end of the day so that
it doesn't miss the deadline.

On 09/01/2025 23:04, Richard Sandiford wrote:
Akram Ahmad <akram.ah...@arm.com> writes:
In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.
This can be fixed by changing:

        case CT_REGISTER:
          if (REG_P (op) || SUBREG_P (op))
            return true;
          break;

to:

        case CT_REGISTER:
          if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH)
            return true;
          break;

But I can test & post that as a follow-up if you prefer.
Yes please, if that's not too much trouble- would that have to go into
another patch?
+
  ;; Double vector modes.
  (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF V4BF])
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
new file mode 100644
index 00000000000..2b72be7b0d7
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
@@ -0,0 +1,79 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+/*
+** uadd_lane: { xfail *-*-* }
+**     dup\tv([0-9]+).8b, w0
+**     uqadd\tb([0-9]+), (?:b\1, b0|b0, b\1)
+**     umov\tw0, v\2.b\[0\]
+**     ret
+*/
Whats the reason behind the xfail?  Is it the early-ra thing, or
something else?  (You might already have covered this, sorry.)

xfailing is fine if it needs further optimisation, was just curious :)
This is because of a missing pattern in match.pd (I've sent another patch upstream to add the missing pattern, although it may have gotten lost). Once that pattern is added though, this should be recognised as .SAT_SUB, and the new instructions will
appear.
[...]
diff --git a/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c 
b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
new file mode 100644
index 00000000000..0fc6804683a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
@@ -0,0 +1,270 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps -mearly-ra=none" } */
It'd be worth adding -fno-schedule-insns2 here.  Same for
saturating_arithmetic_1.c and saturating_arithmetic_2.c.  The reason
is that:

+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <limits.h>
+#include <stdbool.h>
+#include <stdint.h>
+
+/*
+** sadd32:
+**     asr     w([0-9]+), w1, 31
+**     adds    w([0-9]+), (?:w0, w1|w1, w0)
+**     eor     w\1, w\1, -2147483648
+**     csinv   w0, w\2, w\1, vc
+**     ret
+*/
...the first two instructions can be in either order, and similarly
for the second and third.

Really nice tests though :)

Thanks! That also makes a lot of sense, I was cautious of assuming the instructions would always be in that exact order, so it's good to know I can try and specify that.

Reply via email to