RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

Tamar Christina via Gcc-patches Fri, 23 Sep 2022 06:55:22 -0700

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm....@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Friday, September 23, 2022 9:14 AM
> To: Richard Biener <rguent...@suse.de>
> Cc: Richard Sandiford <richard.sandif...@arm.com>; nd <n...@arm.com>;
> Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>;
> juzhe.zh...@rivai.ai
> Subject: RE: [PATCH]middle-end Add optimized float addsub without
> needing VEC_PERM_EXPR.
> 
> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Friday, September 23, 2022 8:54 AM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: Richard Sandiford <richard.sandif...@arm.com>; Tamar Christina via
> > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>;
> > juzhe.zh...@rivai.ai
> > Subject: RE: [PATCH]middle-end Add optimized float addsub without
> > needing VEC_PERM_EXPR.
> >
> > On Fri, 23 Sep 2022, Tamar Christina wrote:
> >
> > > Hi,
> > >
> > > Attached is the respun version of the patch,
> > >
> > > > >>
> > > > >> Wouldn't a target need to re-check if lanes are NaN or denormal
> > > > >> if after a SFmode lane operation a DFmode lane operation follows?
> > > > >> IIRC that is what usually makes punning "integer" vectors as FP
> > > > >> vectors
> > costly.
> > >
> > > I don't believe this is a problem, due to NANs not being a single
> > > value and according to the standard the sign bit doesn't change the
> > meaning of a NAN.
> > >
> > > That's why specifically for negates generally no check is performed
> > > and it's Assumed that if a value is a NaN going in, it's a NaN
> > > coming out, and this Optimization doesn't change that.  Also under
> > > fast-math we don't guarantee a stable representation for NaN (or zeros,
> etc) afaik.
> > >
> > > So if that is still a concern I could add && !HONORS_NAN () to the
> > constraints.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * match.pd: Add fneg/fadd rule.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/simd/addsub_1.c: New test.
> > >   * gcc.target/aarch64/sve/addsub_1.c: New test.
> > >
> > > --- inline version of patch ---
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > >
> >
> 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980e
> > e0a
> > > f3757f5ec1cf 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -7916,6 +7916,59 @@ and,
> > >    (simplify (reduc (op @0 VECTOR_CST@1))
> > >      (op (reduc:type @0) (reduc:type @1))))
> > >
> > > +/* Simplify vector floating point operations of alternating sub/add pairs
> > > +   into using an fneg of a wider element type followed by a normal add.
> > > +   under IEEE 754 the fneg of the wider type will negate every even entry
> > > +   and when doing an add we get a sub of the even and add of every odd
> > > +   elements.  */
> > > +(simplify
> > > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)  (if
> > > +(!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)
> >
> > shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead?
> >
> > I'm still concerned what
> >
> >  (neg:V2DF (subreg:V2DF (reg:V4SF) 0))
> >
> > means for architectures like RISC-V.  Can one "reformat" FP values in
> > vector registers so that two floats overlap a double (and then back)?
> >
> > I suppose you rely on target_can_change_mode_class to tell you that.
> 
> Indeed, the documentation says:
> 
> "This hook returns true if it is possible to bitcast values held in registers 
> of
> class rclass from mode from to mode to and if doing so preserves the low-
> order bits that are common to both modes. The result is only meaningful if
> rclass has registers that can hold both from and to."
> 
> This implies to me that if the bitcast shouldn't be possible the hook should
> reject it.
> Of course you always where something is possible, but perhaps not cheap to
> do.
> 
> The specific implementation for RISC-V seem to imply to me that they
> disallow any FP conversions. So seems to be ok.
> 
> >
> >
> > > +  (with
> > > +   {
> > > +     /* Build a vector of integers from the tree mask.  */
> > > +     vec_perm_builder builder;
> > > +     if (!tree_to_vec_perm_builder (&builder, @2))
> > > +       return NULL_TREE;
> > > +
> > > +     /* Create a vec_perm_indices for the integer vector.  */
> > > +     poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> > > +     vec_perm_indices sel (builder, 2, nelts);
> > > +   }
> > > +   (if (sel.series_p (0, 2, 0, 2))
> > > +    (with
> > > +     {
> > > +       machine_mode vec_mode = TYPE_MODE (type);
> > > +       auto elem_mode = GET_MODE_INNER (vec_mode);
> > > +       auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
> > > +       tree stype;
> > > +       switch (elem_mode)
> > > +  {
> > > +  case E_HFmode:
> > > +    stype = float_type_node;
> > > +    break;
> > > +  case E_SFmode:
> > > +    stype = double_type_node;
> > > +    break;
> > > +  default:
> > > +    return NULL_TREE;
> > > +  }
> >
> > Can't you use GET_MODE_WIDER_MODE and double-check the mode-size
> > doubles?  I mean you obviously miss DFmode -> TFmode.
> 
> Problem is I need the type, not the mode, but all even
> build_pointer_type_for_mode requires the new scalar type.  So I couldn't
> find anything to help here given that there's no inverse relationship between
> modes and types.
>


I meant build_vector_type_for_mode here.

> >
> > > +       tree ntype = build_vector_type (stype, nunits);
> > > +       if (!ntype)
> >
> > You want to check that the above results in a vector mode.
> 
> Does it? Technically you can cast a V2SF to both a V1DF or DF can't you?
> Both seem equally valid here.
> 
> > > +  return NULL_TREE;
> > > +
> > > +       /* The format has to have a simple sign bit.  */
> > > +       const struct real_format *fmt = FLOAT_MODE_FORMAT
> (vec_mode);
> > > +       if (fmt == NULL)
> > > +  return NULL_TREE;
> > > +     }
> > > +     (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1
> >
> > shouldn't this be a check on the component mode?  I think you'd want
> > to check that the bigger format signbit_rw is equal to the smaller
> > format mode size plus its signbit_rw or so?
> 
> Tbh, both are somewhat weak guarantees.  In a previous patch of mine I'd
> added a new field "is_ieee"
> to the real formats to denote that they are an IEEE type.  Maybe I should
> revive that instead?
> 
> Regards,
> Tamar
> 
> >
> > > +   && fmt->signbit_rw == fmt->signbit_ro
> > > +   && targetm.can_change_mode_class (TYPE_MODE (ntype),
> > TYPE_MODE (type), ALL_REGS)
> > > +   && (optimize_vectors_before_lowering_p ()
> > > +       || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
> > > +      (plus (view_convert:type (negate (view_convert:ntype @1)))
> > > +@0)))))))
> > > +
> > >  (simplify
> > >   (vec_perm @0 @1 VECTOR_CST@2)
> > >   (with
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > > new file mode 100644
> > > index
> > >
> >
> 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0db
> > bf1
> > > b47ad43310c4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> > > @@ -0,0 +1,56 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
> > > +/* { dg-options "-Ofast" } */
> > > +/* { dg-add-options arm_v8_2a_fp16_neon } */
> > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } }
> > > +} } */
> > > +
> > > +#pragma GCC target "+nosve"
> > > +
> > > +/*
> > > +** f1:
> > > +** ...
> > > +**       fneg    v[0-9]+.2d, v[0-9]+.2d
> > > +**       fadd    v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
> > > +** ...
> > > +*/
> > > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> > > +   for (int i = 0; i < (n & -4); i+=2)
> > > +    {
> > > +      res[i+0] = a[i+0] + b[i+0];
> > > +      res[i+1] = a[i+1] - b[i+1];
> > > +    }
> > > +}
> > > +
> > > +/*
> > > +** d1:
> > > +** ...
> > > +**       fneg    v[0-9]+.4s, v[0-9]+.4s
> > > +**       fadd    v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h
> > > +** ...
> > > +*/
> > > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res,
> > > +int n) {
> > > +   for (int i = 0; i < (n & -8); i+=2)
> > > +    {
> > > +      res[i+0] = a[i+0] + b[i+0];
> > > +      res[i+1] = a[i+1] - b[i+1];
> > > +    }
> > > +}
> > > +
> > > +/*
> > > +** e1:
> > > +** ...
> > > +**       fadd    v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
> > > +**       fsub    v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
> > > +**       ins     v[0-9]+.d\[1\], v[0-9]+.d\[1\]
> > > +** ...
> > > +*/
> > > +void e1 (double *restrict a, double *restrict b, double *res, int
> > > +n) {
> > > +   for (int i = 0; i < (n & -4); i+=2)
> > > +    {
> > > +      res[i+0] = a[i+0] + b[i+0];
> > > +      res[i+1] = a[i+1] - b[i+1];
> > > +    }
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > > b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > > new file mode 100644
> > > index
> > >
> >
> 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a
> > 31
> > > 4a29b7a7a922
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c
> > > @@ -0,0 +1,52 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-Ofast" } */
> > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } }
> > > +} } */
> > > +
> > > +/*
> > > +** f1:
> > > +** ...
> > > +**       fneg    z[0-9]+.d, p[0-9]+/m, z[0-9]+.d
> > > +**       fadd    z[0-9]+.s, z[0-9]+.s, z[0-9]+.s
> > > +** ...
> > > +*/
> > > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> > > +   for (int i = 0; i < (n & -4); i+=2)
> > > +    {
> > > +      res[i+0] = a[i+0] + b[i+0];
> > > +      res[i+1] = a[i+1] - b[i+1];
> > > +    }
> > > +}
> > > +
> > > +/*
> > > +** d1:
> > > +** ...
> > > +**       fneg    z[0-9]+.s, p[0-9]+/m, z[0-9]+.s
> > > +**       fadd    z[0-9]+.h, z[0-9]+.h, z[0-9]+.h
> > > +** ...
> > > +*/
> > > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res,
> > > +int n) {
> > > +   for (int i = 0; i < (n & -8); i+=2)
> > > +    {
> > > +      res[i+0] = a[i+0] + b[i+0];
> > > +      res[i+1] = a[i+1] - b[i+1];
> > > +    }
> > > +}
> > > +
> > > +/*
> > > +** e1:
> > > +** ...
> > > +**       fsub    z[0-9]+.d, z[0-9]+.d, z[0-9]+.d
> > > +**       movprfx z[0-9]+.d, p[0-9]+/m, z[0-9]+.d
> > > +**       fadd    z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d
> > > +** ...
> > > +*/
> > > +void e1 (double *restrict a, double *restrict b, double *res, int
> > > +n) {
> > > +   for (int i = 0; i < (n & -4); i+=2)
> > > +    {
> > > +      res[i+0] = a[i+0] + b[i+0];
> > > +      res[i+1] = a[i+1] - b[i+1];
> > > +    }
> > > +}
> > >
> > >
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)

RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

Reply via email to