RE: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

Tamar Christina Wed, 27 Sep 2023 00:57:10 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Wednesday, September 27, 2023 8:12 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: Andrew Pinski <pins...@gmail.com>; gcc-patches@gcc.gnu.org; nd
> <n...@arm.com>; j...@ventanamicro.com
> Subject: RE: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 <<
> signbit(x)) [PR109154]
> 
> On Wed, 27 Sep 2023, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Andrew Pinski <pins...@gmail.com>
> > > Sent: Wednesday, September 27, 2023 2:17 AM
> > > To: Tamar Christina <tamar.christ...@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de;
> > > j...@ventanamicro.com
> > > Subject: Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to
> > > x | (1 <<
> > > signbit(x)) [PR109154]
> > >
> > > On Tue, Sep 26, 2023 at 5:51?PM Tamar Christina
> > > <tamar.christ...@arm.com>
> > > wrote:
> > > >
> > > > Hi All,
> > > >
> > > > For targets that allow conversion between int and float modes this
> > > > adds a new optimization transforming fneg (fabs (x)) into x | (1
> > > > << signbit(x)).  Such sequences are common in scientific code
> > > > working with
> > > gradients.
> > > >
> > > > The transformed instruction if the target has an inclusive-OR that
> > > > takes an immediate is both shorter an faster.  For those that
> > > > don't the immediate has to be seperate constructed but this still
> > > > ends up being faster as the immediate construction is not on the 
> > > > critical
> path.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > >
> > > I think this should be part of isel instead of match.
> > > Maybe we could use genmatch to generate the code that does the
> > > transformations but this does not belong as part of match really.
> >
> > I disagree.. I don't think this belongs in isel. Isel is for structural
> transformations.
> > If there is a case for something else I'd imagine backwardprop is a better
> choice.
> >
> > But I don't see why it doesn't belong here considering it *is* a
> > mathematical optimization and the file has plenty of transformations
> > such as mask optimizations and vector conditional rewriting.
> 
> But the mathematical transform would more generally be fneg (fabs (x)) ->
> copysign (x, -1.) and that can be optimally expanded at RTL expansion time?


Ah sure, atm I did copysign (x, -1) -> x | 1 << signbits.  I can do it the 
other way
around.  And I guess since copysign (-x, y), copysign(|x|, y) -> copysign (x, 
y) that
should solve the trigonometry problem too.

Cool will do that instead, thanks!

Tamar

> 
> Richard.
> 
> > Regards,
> > Tamar
> >
> > >
> > > Thanks,
> > > Andrew
> > >
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR tree-optimization/109154
> > > >         * match.pd: Add new neg+abs rule.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         PR tree-optimization/109154
> > > >         * gcc.target/aarch64/fneg-abs_1.c: New test.
> > > >         * gcc.target/aarch64/fneg-abs_2.c: New test.
> > > >         * gcc.target/aarch64/fneg-abs_3.c: New test.
> > > >         * gcc.target/aarch64/fneg-abs_4.c: New test.
> > > >         * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> > > >         * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> > > >         * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> > > >         * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > >
> > >
> 39c7ea1088f25538ed8bd26ee89711566141a71f..8ebde06dcd4b26d69482
> > > 6cffad0f
> > > > b17e1136600a 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -9476,3 +9476,57 @@ and,
> > > >        }
> > > >        (if (full_perm_p)
> > > >         (vec_perm (op@3 @0 @1) @3 @2))))))
> > > > +
> > > > +/* Transform fneg (fabs (X)) -> X | 1 << signbit (X).  */
> > > > +
> > > > +(simplify
> > > > + (negate (abs @0))
> > > > + (if (FLOAT_TYPE_P (type)
> > > > +      /* We have to delay this rewriting till after forward prop
> > > > +because
> > > otherwise
> > > > +        it's harder to do trigonometry optimizations. e.g. 
> > > > cos(-fabs(x)) is not
> > > > +        matched in one go.  Instead cos (-x) is matched first
> > > > + followed by
> > > cos(|x|).
> > > > +        The bottom op approach makes this rule match first and it's not
> untill
> > > > +        fwdprop that we match top down.  There are manu such
> > > > + simplications
> > > so we
> > > > +        delay this optimization till later on.  */
> > > > +      && canonicalize_math_after_vectorization_p ())  (with {
> > > > +    tree itype = unsigned_type_for (type);
> > > > +    machine_mode mode = TYPE_MODE (type);
> > > > +    const struct real_format *float_fmt = FLOAT_MODE_FORMAT (mode);
> > > > +    auto optab = VECTOR_TYPE_P (type) ? optab_vector : optab_default; }
> > > > +   (if (float_fmt
> > > > +       && float_fmt->signbit_rw >= 0
> > > > +       && targetm.can_change_mode_class (TYPE_MODE (itype),
> > > > +                                         TYPE_MODE (type), ALL_REGS)
> > > > +        && target_supports_op_p (itype, BIT_IOR_EXPR, optab))
> > > > +    (with { wide_int wone = wi::one (element_precision (type));
> > > > +           int sbit = float_fmt->signbit_rw;
> > > > +           auto stype = VECTOR_TYPE_P (type) ? TREE_TYPE (itype) : 
> > > > itype;
> > > > +           tree sign_bit = wide_int_to_tree (stype, wi::lshift (wone, 
> > > > sbit));}
> > > > +     (view_convert:type
> > > > +      (bit_ior (view_convert:itype @0)
> > > > +              { build_uniform_cst (itype, sign_bit); } )))))))
> > > > +
> > > > +/* Repeat the same but for conditional negate.  */
> > > > +
> > > > +(simplify
> > > > + (IFN_COND_NEG @1 (abs @0) @2)
> > > > + (if (FLOAT_TYPE_P (type))
> > > > +  (with {
> > > > +    tree itype = unsigned_type_for (type);
> > > > +    machine_mode mode = TYPE_MODE (type);
> > > > +    const struct real_format *float_fmt = FLOAT_MODE_FORMAT (mode);
> > > > +    auto optab = VECTOR_TYPE_P (type) ? optab_vector : optab_default; }
> > > > +   (if (float_fmt
> > > > +       && float_fmt->signbit_rw >= 0
> > > > +       && targetm.can_change_mode_class (TYPE_MODE (itype),
> > > > +                                         TYPE_MODE (type), ALL_REGS)
> > > > +        && target_supports_op_p (itype, BIT_IOR_EXPR, optab))
> > > > +    (with { wide_int wone = wi::one (element_precision (type));
> > > > +           int sbit = float_fmt->signbit_rw;
> > > > +           auto stype = VECTOR_TYPE_P (type) ? TREE_TYPE (itype) : 
> > > > itype;
> > > > +           tree sign_bit = wide_int_to_tree (stype, wi::lshift (wone, 
> > > > sbit));}
> > > > +     (view_convert:type
> > > > +      (IFN_COND_IOR @1 (view_convert:itype @0)
> > > > +              { build_uniform_cst (itype, sign_bit); }
> > > > +              (view_convert:itype @2) )))))))
> > > > \ No newline at end of file
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
> > > > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..f823013c3ddf6b3a266
> > > c3abfcbf2
> > > > 642fc2a75fa6
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
> > > > @@ -0,0 +1,39 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#pragma GCC target "+nosve"
> > > > +
> > > > +#include <arm_neon.h>
> > > > +
> > > > +/*
> > > > +** t1:
> > > > +**     orr     v[0-9]+.2s, #128, lsl #24
> > > > +**     ret
> > > > +*/
> > > > +float32x2_t t1 (float32x2_t a)
> > > > +{
> > > > +  return vneg_f32 (vabs_f32 (a)); }
> > > > +
> > > > +/*
> > > > +** t2:
> > > > +**     orr     v[0-9]+.4s, #128, lsl #24
> > > > +**     ret
> > > > +*/
> > > > +float32x4_t t2 (float32x4_t a)
> > > > +{
> > > > +  return vnegq_f32 (vabsq_f32 (a)); }
> > > > +
> > > > +/*
> > > > +** t3:
> > > > +**     adrp    x0, .LC[0-9]+
> > > > +**     ldr     q[0-9]+, \[x0, #:lo12:.LC0\]
> > > > +**     orr     v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
> > > > +**     ret
> > > > +*/
> > > > +float64x2_t t3 (float64x2_t a)
> > > > +{
> > > > +  return vnegq_f64 (vabsq_f64 (a)); }
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
> > > > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..141121176b309e4b2a
> > > a413dc5527
> > > > 1a6e3c93d5e1
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
> > > > @@ -0,0 +1,31 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#pragma GCC target "+nosve"
> > > > +
> > > > +#include <arm_neon.h>
> > > > +#include <math.h>
> > > > +
> > > > +/*
> > > > +** f1:
> > > > +**     movi    v[0-9]+.2s, 0x80, lsl 24
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +float32_t f1 (float32_t a)
> > > > +{
> > > > +  return -fabsf (a);
> > > > +}
> > > > +
> > > > +/*
> > > > +** f2:
> > > > +**     mov     x0, -9223372036854775808
> > > > +**     fmov    d[0-9]+, x0
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +float64_t f2 (float64_t a)
> > > > +{
> > > > +  return -fabs (a);
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
> > > > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..b4652173a95d104ddf
> > > a70c497f06
> > > > 27a61ea89d3b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
> > > > @@ -0,0 +1,36 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#pragma GCC target "+nosve"
> > > > +
> > > > +#include <arm_neon.h>
> > > > +#include <math.h>
> > > > +
> > > > +/*
> > > > +** f1:
> > > > +**     ...
> > > > +**     ldr     q[0-9]+, \[x0\]
> > > > +**     orr     v[0-9]+.4s, #128, lsl #24
> > > > +**     str     q[0-9]+, \[x0\], 16
> > > > +**     ...
> > > > +*/
> > > > +void f1 (float32_t *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < (n & -8); i++)
> > > > +   a[i] = -fabsf (a[i]);
> > > > +}
> > > > +
> > > > +/*
> > > > +** f2:
> > > > +**     ...
> > > > +**     ldr     q[0-9]+, \[x0\]
> > > > +**     orr     v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
> > > > +**     str     q[0-9]+, \[x0\], 16
> > > > +**     ...
> > > > +*/
> > > > +void f2 (float64_t *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < (n & -8); i++)
> > > > +   a[i] = -fabs (a[i]);
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> > > > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..10879dea74462d34b2
> > > 6160eeb0bd
> > > > 54ead063166b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
> > > > @@ -0,0 +1,39 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#pragma GCC target "+nosve"
> > > > +
> > > > +#include <string.h>
> > > > +
> > > > +/*
> > > > +** negabs:
> > > > +**     mov     x0, -9223372036854775808
> > > > +**     fmov    d[0-9]+, x0
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +double negabs (double x)
> > > > +{
> > > > +   unsigned long long y;
> > > > +   memcpy (&y, &x, sizeof(double));
> > > > +   y = y | (1UL << 63);
> > > > +   memcpy (&x, &y, sizeof(double));
> > > > +   return x;
> > > > +}
> > > > +
> > > > +/*
> > > > +** negabsf:
> > > > +**     movi    v[0-9]+.2s, 0x80, lsl 24
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +float negabsf (float x)
> > > > +{
> > > > +   unsigned int y;
> > > > +   memcpy (&y, &x, sizeof(float));
> > > > +   y = y | (1U << 31);
> > > > +   memcpy (&x, &y, sizeof(float));
> > > > +   return x;
> > > > +}
> > > > +
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..0c7664e6de77a49768
> > > 2952653ffd
> > > > 417453854d52
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
> > > > @@ -0,0 +1,37 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#include <arm_neon.h>
> > > > +
> > > > +/*
> > > > +** t1:
> > > > +**     orr     v[0-9]+.2s, #128, lsl #24
> > > > +**     ret
> > > > +*/
> > > > +float32x2_t t1 (float32x2_t a)
> > > > +{
> > > > +  return vneg_f32 (vabs_f32 (a)); }
> > > > +
> > > > +/*
> > > > +** t2:
> > > > +**     orr     v[0-9]+.4s, #128, lsl #24
> > > > +**     ret
> > > > +*/
> > > > +float32x4_t t2 (float32x4_t a)
> > > > +{
> > > > +  return vnegq_f32 (vabsq_f32 (a)); }
> > > > +
> > > > +/*
> > > > +** t3:
> > > > +**     adrp    x0, .LC[0-9]+
> > > > +**     ldr     q[0-9]+, \[x0, #:lo12:.LC0\]
> > > > +**     orr     v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
> > > > +**     ret
> > > > +*/
> > > > +float64x2_t t3 (float64x2_t a)
> > > > +{
> > > > +  return vnegq_f64 (vabsq_f64 (a)); }
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..a60cd31b9294af2dac6
> > > 9eed1c93f
> > > > 899bd5c78fca
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
> > > > @@ -0,0 +1,29 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#include <arm_neon.h>
> > > > +#include <math.h>
> > > > +
> > > > +/*
> > > > +** f1:
> > > > +**     movi    v[0-9]+.2s, 0x80, lsl 24
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +float32_t f1 (float32_t a)
> > > > +{
> > > > +  return -fabsf (a);
> > > > +}
> > > > +
> > > > +/*
> > > > +** f2:
> > > > +**     mov     x0, -9223372036854775808
> > > > +**     fmov    d[0-9]+, x0
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +float64_t f2 (float64_t a)
> > > > +{
> > > > +  return -fabs (a);
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..1bf34328d8841de8e6
> > > b0a5458562
> > > > a9f00e31c275
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
> > > > @@ -0,0 +1,34 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#include <arm_neon.h>
> > > > +#include <math.h>
> > > > +
> > > > +/*
> > > > +** f1:
> > > > +**     ...
> > > > +**     ld1w    z[0-9]+.s, p[0-9]+/z, \[x0, x2, lsl 2\]
> > > > +**     orr     z[0-9]+.s, z[0-9]+.s, #0x80000000
> > > > +**     st1w    z[0-9]+.s, p[0-9]+, \[x0, x2, lsl 2\]
> > > > +**     ...
> > > > +*/
> > > > +void f1 (float32_t *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < (n & -8); i++)
> > > > +   a[i] = -fabsf (a[i]);
> > > > +}
> > > > +
> > > > +/*
> > > > +** f2:
> > > > +**     ...
> > > > +**     ld1d    z[0-9]+.d, p[0-9]+/z, \[x0, x2, lsl 3\]
> > > > +**     orr     z[0-9]+.d, z[0-9]+.d, #0x8000000000000000
> > > > +**     st1d    z[0-9]+.d, p[0-9]+, \[x0, x2, lsl 3\]
> > > > +**     ...
> > > > +*/
> > > > +void f2 (float64_t *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < (n & -8); i++)
> > > > +   a[i] = -fabs (a[i]);
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 0000000000000000000000000000000000000000..21f2a8da2a5d44e3d0
> > > 1f6604ca7b
> > > > e87e3744d494
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
> > > > @@ -0,0 +1,37 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3" } */
> > > > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 }
> > > > +} } */
> > > > +
> > > > +#include <string.h>
> > > > +
> > > > +/*
> > > > +** negabs:
> > > > +**     mov     x0, -9223372036854775808
> > > > +**     fmov    d[0-9]+, x0
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +double negabs (double x)
> > > > +{
> > > > +   unsigned long long y;
> > > > +   memcpy (&y, &x, sizeof(double));
> > > > +   y = y | (1UL << 63);
> > > > +   memcpy (&x, &y, sizeof(double));
> > > > +   return x;
> > > > +}
> > > > +
> > > > +/*
> > > > +** negabsf:
> > > > +**     movi    v[0-9]+.2s, 0x80, lsl 24
> > > > +**     orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
> > > > +**     ret
> > > > +*/
> > > > +float negabsf (float x)
> > > > +{
> > > > +   unsigned int y;
> > > > +   memcpy (&y, &x, sizeof(float));
> > > > +   y = y | (1U << 31);
> > > > +   memcpy (&x, &y, sizeof(float));
> > > > +   return x;
> > > > +}
> > > > +
> > > >
> > > >
> > > >
> > > >
> > > > --
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)

RE: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

Reply via email to