> -----Original Message----- > From: Gcc-patches <gcc-patches- > bounces+tamar.christina=arm....@gcc.gnu.org> On Behalf Of Tamar > Christina via Gcc-patches > Sent: Friday, September 23, 2022 9:14 AM > To: Richard Biener <rguent...@suse.de> > Cc: Richard Sandiford <richard.sandif...@arm.com>; nd <n...@arm.com>; > Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; > juzhe.zh...@rivai.ai > Subject: RE: [PATCH]middle-end Add optimized float addsub without > needing VEC_PERM_EXPR. > > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Friday, September 23, 2022 8:54 AM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina via > > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>; > > juzhe.zh...@rivai.ai > > Subject: RE: [PATCH]middle-end Add optimized float addsub without > > needing VEC_PERM_EXPR. > > > > On Fri, 23 Sep 2022, Tamar Christina wrote: > > > > > Hi, > > > > > > Attached is the respun version of the patch, > > > > > > > >> > > > > >> Wouldn't a target need to re-check if lanes are NaN or denormal > > > > >> if after a SFmode lane operation a DFmode lane operation follows? > > > > >> IIRC that is what usually makes punning "integer" vectors as FP > > > > >> vectors > > costly. > > > > > > I don't believe this is a problem, due to NANs not being a single > > > value and according to the standard the sign bit doesn't change the > > meaning of a NAN. > > > > > > That's why specifically for negates generally no check is performed > > > and it's Assumed that if a value is a NaN going in, it's a NaN > > > coming out, and this Optimization doesn't change that. Also under > > > fast-math we don't guarantee a stable representation for NaN (or zeros, > etc) afaik. > > > > > > So if that is still a concern I could add && !HONORS_NAN () to the > > constraints. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * match.pd: Add fneg/fadd rule. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/aarch64/simd/addsub_1.c: New test. > > > * gcc.target/aarch64/sve/addsub_1.c: New test. > > > > > > --- inline version of patch --- > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd index > > > > > > 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980e > > e0a > > > f3757f5ec1cf 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -7916,6 +7916,59 @@ and, > > > (simplify (reduc (op @0 VECTOR_CST@1)) > > > (op (reduc:type @0) (reduc:type @1)))) > > > > > > +/* Simplify vector floating point operations of alternating sub/add pairs > > > + into using an fneg of a wider element type followed by a normal add. > > > + under IEEE 754 the fneg of the wider type will negate every even entry > > > + and when doing an add we get a sub of the even and add of every odd > > > + elements. */ > > > +(simplify > > > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2) (if > > > +(!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN) > > > > shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead? > > > > I'm still concerned what > > > > (neg:V2DF (subreg:V2DF (reg:V4SF) 0)) > > > > means for architectures like RISC-V. Can one "reformat" FP values in > > vector registers so that two floats overlap a double (and then back)? > > > > I suppose you rely on target_can_change_mode_class to tell you that. > > Indeed, the documentation says: > > "This hook returns true if it is possible to bitcast values held in registers > of > class rclass from mode from to mode to and if doing so preserves the low- > order bits that are common to both modes. The result is only meaningful if > rclass has registers that can hold both from and to." > > This implies to me that if the bitcast shouldn't be possible the hook should > reject it. > Of course you always where something is possible, but perhaps not cheap to > do. > > The specific implementation for RISC-V seem to imply to me that they > disallow any FP conversions. So seems to be ok. > > > > > > > > + (with > > > + { > > > + /* Build a vector of integers from the tree mask. */ > > > + vec_perm_builder builder; > > > + if (!tree_to_vec_perm_builder (&builder, @2)) > > > + return NULL_TREE; > > > + > > > + /* Create a vec_perm_indices for the integer vector. */ > > > + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); > > > + vec_perm_indices sel (builder, 2, nelts); > > > + } > > > + (if (sel.series_p (0, 2, 0, 2)) > > > + (with > > > + { > > > + machine_mode vec_mode = TYPE_MODE (type); > > > + auto elem_mode = GET_MODE_INNER (vec_mode); > > > + auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2); > > > + tree stype; > > > + switch (elem_mode) > > > + { > > > + case E_HFmode: > > > + stype = float_type_node; > > > + break; > > > + case E_SFmode: > > > + stype = double_type_node; > > > + break; > > > + default: > > > + return NULL_TREE; > > > + } > > > > Can't you use GET_MODE_WIDER_MODE and double-check the mode-size > > doubles? I mean you obviously miss DFmode -> TFmode. > > Problem is I need the type, not the mode, but all even > build_pointer_type_for_mode requires the new scalar type. So I couldn't > find anything to help here given that there's no inverse relationship between > modes and types. >
I meant build_vector_type_for_mode here. > > > > > + tree ntype = build_vector_type (stype, nunits); > > > + if (!ntype) > > > > You want to check that the above results in a vector mode. > > Does it? Technically you can cast a V2SF to both a V1DF or DF can't you? > Both seem equally valid here. > > > > + return NULL_TREE; > > > + > > > + /* The format has to have a simple sign bit. */ > > > + const struct real_format *fmt = FLOAT_MODE_FORMAT > (vec_mode); > > > + if (fmt == NULL) > > > + return NULL_TREE; > > > + } > > > + (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1 > > > > shouldn't this be a check on the component mode? I think you'd want > > to check that the bigger format signbit_rw is equal to the smaller > > format mode size plus its signbit_rw or so? > > Tbh, both are somewhat weak guarantees. In a previous patch of mine I'd > added a new field "is_ieee" > to the real formats to denote that they are an IEEE type. Maybe I should > revive that instead? > > Regards, > Tamar > > > > > > + && fmt->signbit_rw == fmt->signbit_ro > > > + && targetm.can_change_mode_class (TYPE_MODE (ntype), > > TYPE_MODE (type), ALL_REGS) > > > + && (optimize_vectors_before_lowering_p () > > > + || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector))) > > > + (plus (view_convert:type (negate (view_convert:ntype @1))) > > > +@0))))))) > > > + > > > (simplify > > > (vec_perm @0 @1 VECTOR_CST@2) > > > (with > > > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c > > > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c > > > new file mode 100644 > > > index > > > > > > 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0db > > bf1 > > > b47ad43310c4 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c > > > @@ -0,0 +1,56 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */ > > > +/* { dg-options "-Ofast" } */ > > > +/* { dg-add-options arm_v8_2a_fp16_neon } */ > > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } > > > +} } */ > > > + > > > +#pragma GCC target "+nosve" > > > + > > > +/* > > > +** f1: > > > +** ... > > > +** fneg v[0-9]+.2d, v[0-9]+.2d > > > +** fadd v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > > > +** ... > > > +*/ > > > +void f1 (float *restrict a, float *restrict b, float *res, int n) { > > > + for (int i = 0; i < (n & -4); i+=2) > > > + { > > > + res[i+0] = a[i+0] + b[i+0]; > > > + res[i+1] = a[i+1] - b[i+1]; > > > + } > > > +} > > > + > > > +/* > > > +** d1: > > > +** ... > > > +** fneg v[0-9]+.4s, v[0-9]+.4s > > > +** fadd v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h > > > +** ... > > > +*/ > > > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, > > > +int n) { > > > + for (int i = 0; i < (n & -8); i+=2) > > > + { > > > + res[i+0] = a[i+0] + b[i+0]; > > > + res[i+1] = a[i+1] - b[i+1]; > > > + } > > > +} > > > + > > > +/* > > > +** e1: > > > +** ... > > > +** fadd v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d > > > +** fsub v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d > > > +** ins v[0-9]+.d\[1\], v[0-9]+.d\[1\] > > > +** ... > > > +*/ > > > +void e1 (double *restrict a, double *restrict b, double *res, int > > > +n) { > > > + for (int i = 0; i < (n & -4); i+=2) > > > + { > > > + res[i+0] = a[i+0] + b[i+0]; > > > + res[i+1] = a[i+1] - b[i+1]; > > > + } > > > +} > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c > > > b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c > > > new file mode 100644 > > > index > > > > > > 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a > > 31 > > > 4a29b7a7a922 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c > > > @@ -0,0 +1,52 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-Ofast" } */ > > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } > > > +} } */ > > > + > > > +/* > > > +** f1: > > > +** ... > > > +** fneg z[0-9]+.d, p[0-9]+/m, z[0-9]+.d > > > +** fadd z[0-9]+.s, z[0-9]+.s, z[0-9]+.s > > > +** ... > > > +*/ > > > +void f1 (float *restrict a, float *restrict b, float *res, int n) { > > > + for (int i = 0; i < (n & -4); i+=2) > > > + { > > > + res[i+0] = a[i+0] + b[i+0]; > > > + res[i+1] = a[i+1] - b[i+1]; > > > + } > > > +} > > > + > > > +/* > > > +** d1: > > > +** ... > > > +** fneg z[0-9]+.s, p[0-9]+/m, z[0-9]+.s > > > +** fadd z[0-9]+.h, z[0-9]+.h, z[0-9]+.h > > > +** ... > > > +*/ > > > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, > > > +int n) { > > > + for (int i = 0; i < (n & -8); i+=2) > > > + { > > > + res[i+0] = a[i+0] + b[i+0]; > > > + res[i+1] = a[i+1] - b[i+1]; > > > + } > > > +} > > > + > > > +/* > > > +** e1: > > > +** ... > > > +** fsub z[0-9]+.d, z[0-9]+.d, z[0-9]+.d > > > +** movprfx z[0-9]+.d, p[0-9]+/m, z[0-9]+.d > > > +** fadd z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d > > > +** ... > > > +*/ > > > +void e1 (double *restrict a, double *restrict b, double *res, int > > > +n) { > > > + for (int i = 0; i < (n & -4); i+=2) > > > + { > > > + res[i+0] = a[i+0] + b[i+0]; > > > + res[i+1] = a[i+1] - b[i+1]; > > > + } > > > +} > > > > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, > > Boudien Moerman; HRB 36809 (AG Nuernberg)