> -----Original Message----- > From: Tamar Christina <tamar.christ...@arm.com> > Sent: Tuesday, September 2, 2025 12:15 PM > To: Kyrylo Tkachov <ktkac...@nvidia.com> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com>; ktkac...@gcc.gnu.org; Richard Sandiford > <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; > andrew.pin...@oss.qualcomm.com > Subject: RE: [PATCH 2/3]AArch64: Add support for addhn vectorizer optabs for > Adv.SIMD > > > -----Original Message----- > > From: Kyrylo Tkachov <ktkac...@nvidia.com> > > Sent: Tuesday, September 2, 2025 12:03 PM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > > <richard.earns...@arm.com>; ktkac...@gcc.gnu.org; Richard Sandiford > > <richard.sandif...@arm.com>; Alex Coplan <alex.cop...@arm.com>; > > andrew.pin...@oss.qualcomm.com > > Subject: Re: [PATCH 2/3]AArch64: Add support for addhn vectorizer optabs for > > Adv.SIMD > > > > Hi Tamar, > > > > > On 2 Sep 2025, at 11:46, Tamar Christina <tamar.christ...@arm.com> wrote: > > > > > > This implements the new vector optabs vec_<su>addh_narrow<mode> > > > adding support for in-vectorizer use for early break. > > > > The Advanced SIMD ADDHN instruction doesn’t perform a widening of the > > operands for the addition as far as I know. > > That is why there are no “SADDHN” or “UADDHN” instructions, which would > > correspond to a sign-extend or zero-extend of the operands. > > So I think there’s something off here…. > > Hmm yeah I confused that size is the size of the destination not the source.
To clarify, while I did mis-read the pseudo-code, there isn't a correctness issue With the patch, hence why the tests pass. For Boolean vectors the values are 0 or -1 and we care about the MSB for the case where we add -1 + -1. So while I don't need the signed vs unsigned distinction, the patch does do the right thing. Tamar > > Let me respin, > Thanks > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > -m32, -m64 and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * config/aarch64/aarch64-simd.md (vec_<su>addh_narrow<mode>): New. > > > * config/aarch64/iterators.md (UNSPEC_SADDHN, UNSPEC_UADDHN): New. > > > (su, ADDHN): Use them. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/aarch64/vect-addhn_1.c: New test. > > > > > > --- > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > b/gcc/config/aarch64/aarch64-simd.md > > > index > > > 8b75c3d7f6d5ddc5c44f841da961423caaebe8b8..a1737792093ffea5f72c10c1c > > b4df6322280dbf4 100644 > > > --- a/gcc/config/aarch64/aarch64-simd.md > > > +++ b/gcc/config/aarch64/aarch64-simd.md > > > @@ -949,6 +949,22 @@ (define_expand "vec_widen_<su>abd_lo_<mode>" > > > } > > > ) > > > > > > +(define_expand "vec_<su>addh_narrow<mode>" > > > + [(set (match_operand:<VNARROWQ> 0 "register_operand") > > > + (unspec:VQN [(plus:VQN (match_operand:VQN 1 "register_operand") > > > + (match_operand:VQN 2 "register_operand"))] > > > + ADDHN))] > > > + "TARGET_SIMD" > > > + { > > > + rtx shft > > > + = aarch64_simd_gen_const_vector_dup (<MODE>mode, > > > + GET_MODE_UNIT_BITSIZE (<MODE>mode) / 2); > > > + emit_insn (gen_aarch64_addhn<mode>_insn (operands[0], operands[1], > > > + operands[2], shft)); > > > + DONE; > > > > … So this would produce the exact same RTL for the “vec_saddh_narrow” and > > “vec_uaddh_narrow” cases. > > > > > > > + } > > > +) > > > + > > > (define_insn "aarch64_<su>abal<mode>" > > > [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > > > (plus:<VWIDE> > > > diff --git a/gcc/config/aarch64/iterators.md > b/gcc/config/aarch64/iterators.md > > > index > > > b15e57843fed78ffe7edd927cfd7acbf395414a4..c65509fe949ad3056d194f05f9 > > 01e76382d62ad8 100644 > > > --- a/gcc/config/aarch64/iterators.md > > > +++ b/gcc/config/aarch64/iterators.md > > > @@ -806,6 +806,8 @@ (define_c_enum "unspec" > > > UNSPEC_UHADD ; Used in aarch64-simd.md. > > > UNSPEC_SRHADD ; Used in aarch64-simd.md. > > > UNSPEC_URHADD ; Used in aarch64-simd.md. > > > + UNSPEC_SADDHN ; Used in aarch64-simd.md. > > > + UNSPEC_UADDHN ; Used in aarch64-simd.md. > > > > As per above, there are no such instructions in Advanced SIMD so it would be > > confusing to have UNSPEC_ values to represent them. > > Thanks, > > Kyrill > > > > > > > UNSPEC_SHSUB ; Used in aarch64-simd.md. > > > UNSPEC_UHSUB ; Used in aarch64-simd.md. > > > UNSPEC_SQDMULH ; Used in aarch64-simd.md. > > > @@ -3251,6 +3253,8 @@ (define_int_iterator HADD [UNSPEC_SHADD > > UNSPEC_UHADD]) > > > > > > (define_int_iterator RHADD [UNSPEC_SRHADD UNSPEC_URHADD]) > > > > > > +(define_int_iterator ADDHN [UNSPEC_SADDHN UNSPEC_UADDHN]) > > > + > > > (define_int_iterator BSL_DUP [1 2]) > > > > > > (define_int_iterator DOTPROD [UNSPEC_SDOT UNSPEC_UDOT]) > > > @@ -4250,7 +4254,8 @@ (define_int_attr su [(UNSPEC_SADDV "s") > > > (UNSPEC_COND_SCVTF "s") > > > (UNSPEC_COND_UCVTF "u") > > > (UNSPEC_SMULHS "s") (UNSPEC_UMULHS "u") > > > - (UNSPEC_SMULHRS "s") (UNSPEC_UMULHRS "u")]) > > > + (UNSPEC_SMULHRS "s") (UNSPEC_UMULHRS "u") > > > + (UNSPEC_SADDHN "s") (UNSPEC_UADDHN "u")]) > > > > > > (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u") > > > (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur") > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-addhn_1.c > > b/gcc/testsuite/gcc.target/aarch64/vect-addhn_1.c > > > new file mode 100644 > > > index > > > 0000000000000000000000000000000000000000..b6f66f979ff71d40d1e092 > > 92c1c12df3c262ce9c > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/vect-addhn_1.c > > > @@ -0,0 +1,88 @@ > > > +/* { dg-require-effective-target vect_int } */ > > > + > > > +#include <stdint.h> > > > +#include <stdio.h> > > > + > > > +#include "tree-vect.h" > > > + > > > +#define N 1000 > > > +#define CHECK_ERROR(cond, fmt, ...) \ > > > + do { if (cond) { printf(fmt "\n", ##__VA_ARGS__); __builtin_abort (); > > > } } while > > (0) > > > + > > > +// Generates all test components for a given type combo > > > +#define TEST_COMBO(A_TYPE, C_TYPE, CAST_TYPE, SHIFT) > > > \ > > > + A_TYPE a_##A_TYPE##_##C_TYPE[N]; > > > \ > > > + A_TYPE b_##A_TYPE##_##C_TYPE[N]; > > > \ > > > + C_TYPE c_##A_TYPE##_##C_TYPE[N]; > > > \ > > > + C_TYPE ref_##A_TYPE##_##C_TYPE[N]; > > > \ > > > + > > > \ > > > + void init_##A_TYPE##_##C_TYPE() { > > > \ > > > + _Pragma ("GCC novector") \ > > > + for (int i = 0; i < N; i++) { > > > \ > > > + a_##A_TYPE##_##C_TYPE[i] = (A_TYPE)(i * 3); > > > \ > > > + b_##A_TYPE##_##C_TYPE[i] = (A_TYPE)(i * 7); > > > \ > > > + } > > > \ > > > + } > > > \ > > > + > > > \ > > > + void foo_##A_TYPE##_##C_TYPE() { > > > \ > > > + for (int i = 0; i < N; i++) > > > \ > > > + c_##A_TYPE##_##C_TYPE[i] = > > > \ > > > + ((CAST_TYPE)a_##A_TYPE##_##C_TYPE[i] + > > > \ > > > + (CAST_TYPE)b_##A_TYPE##_##C_TYPE[i]) >> SHIFT; > > > \ > > > + } > > > \ > > > + > > > \ > > > + void ref_##A_TYPE##_##C_TYPE##_compute() { > > > \ > > > + _Pragma ("GCC novector") \ > > > + for (int i = 0; i < N; i++) > > > \ > > > + ref_##A_TYPE##_##C_TYPE[i] = > > > \ > > > + ((CAST_TYPE)a_##A_TYPE##_##C_TYPE[i] + > > > \ > > > + (CAST_TYPE)b_##A_TYPE##_##C_TYPE[i]) >> SHIFT; > > > \ > > > + } > > > \ > > > + > > > \ > > > + void validate_##A_TYPE##_##C_TYPE(const char* variant_name) { > \ > > > + _Pragma ("GCC novector") \ > > > + for (int i = 0; i < N; i++) { > > > \ > > > + if (c_##A_TYPE##_##C_TYPE[i] != ref_##A_TYPE##_##C_TYPE[i]) { > > > \ > > > + printf("FAIL [%s]: Index %d: got %lld, expected %lld\n", > > > \ > > > + variant_name, i, > > > \ > > > + (long long)c_##A_TYPE##_##C_TYPE[i], > > > \ > > > + (long long)ref_##A_TYPE##_##C_TYPE[i]); > > > \ > > > + __builtin_abort (); > > > \ > > > + } > > > \ > > > + } > > > \ > > > + } > > > + > > > +// Runs the test for one combo with name output > > > +#define RUN_COMBO(A_TYPE, C_TYPE) \ > > > + do { \ > > > + init_##A_TYPE##_##C_TYPE(); \ > > > + foo_##A_TYPE##_##C_TYPE(); \ > > > + ref_##A_TYPE##_##C_TYPE##_compute(); \ > > > + validate_##A_TYPE##_##C_TYPE(#A_TYPE " -> " #C_TYPE); \ > > > + } while (0) > > > + > > > +// Instantiate all valid combinations > > > +TEST_COMBO(int16_t, int8_t, int32_t, 8) > > > +TEST_COMBO(uint16_t, uint8_t, uint32_t, 8) > > > +TEST_COMBO(int32_t, int16_t, int64_t, 16) > > > +TEST_COMBO(uint32_t, uint16_t, uint64_t, 16) > > > +#if defined(__aarch64__) > > > +TEST_COMBO(int64_t, int32_t, __int128_t, 32) > > > +TEST_COMBO(uint64_t, uint32_t, unsigned __int128, 32) > > > +#endif > > > + > > > +int main() { > > > + check_vect (); > > > + > > > + RUN_COMBO(int16_t, int8_t); > > > + RUN_COMBO(uint16_t, uint8_t); > > > + RUN_COMBO(int32_t, int16_t); > > > + RUN_COMBO(uint32_t, uint16_t); > > > +#if defined(__aarch64__) > > > + RUN_COMBO(int64_t, int32_t); > > > + RUN_COMBO(uint64_t, uint32_t); > > > +#endif > > > + > > > + return 0; > > > +} > > > + > > > > > > > > > -- > > > <rb19744.patch>