Akram Ahmad <akram.ah...@arm.com> writes: > Ah whoops- I didn't see this before sending off V4 just now, my apologies. > I'll try my best to get this implemented before the end of the day so that > it doesn't miss the deadline.
No rush! The delay here is entirely my fault, so no problem if the patch lands early stage 4. > On 09/01/2025 23:04, Richard Sandiford wrote: >> Akram Ahmad <akram.ah...@arm.com> writes: >>> In the above example, subtraction replaces the adds with subs and the >>> csinv with csel. The 32-bit case follows the same approach. Arithmetic >>> with a constant operand is simplified further by directly storing the >>> saturating limit in the temporary register, resulting in only three >>> instructions being used. It is important to note that this only works >>> when early-ra is disabled due to an early-ra bug which erroneously >>> assigns FP registers to the operands; if early-ra is enabled, then the >>> original behaviour (NEON instruction) occurs. >> This can be fixed by changing: >> >> case CT_REGISTER: >> if (REG_P (op) || SUBREG_P (op)) >> return true; >> break; >> >> to: >> >> case CT_REGISTER: >> if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH) >> return true; >> break; >> >> But I can test & post that as a follow-up if you prefer. > Yes please, if that's not too much trouble- would that have to go into > another patch? Yeah. But early-ra pessimisations are regressions, since early-ra was new to GCC 14. So that can go in during stage 4 as well. >>> + >>> ;; Double vector modes. >>> (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF V4BF]) >>> >>> diff --git >>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c >>> >>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c >>> new file mode 100644 >>> index 00000000000..2b72be7b0d7 >>> --- /dev/null >>> +++ >>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c >>> @@ -0,0 +1,79 @@ >>> +/* { dg-do assemble { target { aarch64*-*-* } } } */ >>> +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */ >>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ >>> + >>> +/* >>> +** uadd_lane: { xfail *-*-* } >>> +** dup\tv([0-9]+).8b, w0 >>> +** uqadd\tb([0-9]+), (?:b\1, b0|b0, b\1) >>> +** umov\tw0, v\2.b\[0\] >>> +** ret >>> +*/ >> Whats the reason behind the xfail? Is it the early-ra thing, or >> something else? (You might already have covered this, sorry.) >> >> xfailing is fine if it needs further optimisation, was just curious :) > This is because of a missing pattern in match.pd (I've sent another > patch upstream > to add the missing pattern, although it may have gotten lost). Once that > pattern is > added though, this should be recognised as .SAT_SUB, and the new > instructions will > appear. Ah, great! Thanks, Richard