Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

Richard Sandiford Sat, 07 Oct 2023 02:23:00 -0700

Richard Biener <rguent...@suse.de> writes:
> On Thu, 5 Oct 2023, Tamar Christina wrote:
>
>> > I suppose the idea is that -abs(x) might be easier to optimize with other
>> > patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
>> > 
>> > For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
>> > canonical than copysign.
>> > 
>> > > Should I try removing this?
>> > 
>> > I'd say yes (and put the reverse canonicalization next to this pattern).
>> > 
>> 
>> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
>> canonical and allows a target to expand this sequence efficiently.  Such
>> sequences are common in scientific code working with gradients.
>> 
>> various optimizations in match.pd only happened on COPYSIGN but not 
>> COPYSIGN_ALL
>> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to 
>> only
>
> That's not true:
>
> (define_operator_list COPYSIGN
>     BUILT_IN_COPYSIGNF
>     BUILT_IN_COPYSIGN
>     BUILT_IN_COPYSIGNL
>     IFN_COPYSIGN)
>
> but they miss the extended float builtin variants like
> __builtin_copysignf16.  Also see below
>
>> the C99 builtins and so doesn't work for vectors.
>> 
>> The patch expands these optimizations to work on COPYSIGN_ALL.
>> 
>> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
>> which I remove since this is a less efficient form.  The testsuite is also
>> updated in light of this.
>> 
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> 
>> Ok for master?
>> 
>> Thanks,
>> Tamar
>> 
>> gcc/ChangeLog:
>> 
>>      PR tree-optimization/109154
>>      * match.pd: Add new neg+abs rule, remove inverse copysign rule and
>>      expand existing copysign optimizations.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      PR tree-optimization/109154
>>      * gcc.dg/fold-copysign-1.c: Updated.
>>      * gcc.dg/pr55152-2.c: Updated.
>>      * gcc.dg/tree-ssa/abs-4.c: Updated.
>>      * gcc.dg/tree-ssa/backprop-6.c: Updated.
>>      * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>>      * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>>      * gcc.target/aarch64/fneg-abs_1.c: New test.
>>      * gcc.target/aarch64/fneg-abs_2.c: New test.
>>      * gcc.target/aarch64/fneg-abs_3.c: New test.
>>      * gcc.target/aarch64/fneg-abs_4.c: New test.
>>      * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>>      * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>>      * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>>      * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>> 
>> --- inline copy of patch ---
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 
>> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
>>  100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>  
>>  /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
>>  (for coss (COS COSH)
>> -     copysigns (COPYSIGN)
>> - (simplify
>> -  (coss (copysigns @0 @1))
>> -   (coss @0)))
>> + (for copysigns (COPYSIGN_ALL)
>
> So this ends up generating for example the match
> (cosf (copysignl ...)) which doesn't make much sense.
>
> The lock-step iteration did
> (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
> which is leaner but misses the case of
> (cosf (ifn_copysign ..)) - that's probably what you are
> after with this change.
>
> That said, there isn't a nice solution (without altering the match.pd
> IL).  There's the explicit solution, spelling out all combinations.
>
> So if we want to go with yout pragmatic solution changing this
> to use COPYSIGN_ALL isn't necessary, only changing the lock-step
> for iteration to a cross product for iteration is.
>
> Changing just this pattern to
>
> (for coss (COS COSH)
>  (for copysigns (COPYSIGN)
>   (simplify
>    (coss (copysigns @0 @1))
>    (coss @0))))
>
> increases the total number of gimple-match-x.cc lines from
> 234988 to 235324.


I guess the difference between this and the later suggestions is that
this one allows builtin copysign to be paired with ifn cos, which would
be potentially useful in other situations.  (It isn't here because
ifn_cos is rarely provided.)  How much of the growth is due to that,
and much of it is from nonsensical combinations like
(builtin_cosf (builtin_copysignl ...))?

If it's mostly from nonsensical combinations then would it be possible
to make genmatch drop them?

> The alternative is to do
>
> (for coss (COS COSH)
>      copysigns (COPYSIGN)
>  (simplify
>   (coss (copysigns @0 @1))
>    (coss @0))
>  (simplify
>   (coss (IFN_COPYSIGN @0 @1))
>    (coss @0)))
>
> which properly will diagnose a duplicate pattern.  Ther are
> currently no operator lists with just builtins defined (that
> could be fixed, see gencfn-macros.cc), supposed we'd have
> COS_C we could do
>
> (for coss (COS_C COSH_C IFN_COS IFN_COSH)
>      copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN 
> IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN 
> IFN_COPYSIGN)
>  (simplify
>   (coss (copysigns @0 @1))
>    (coss @0)))
>
> which of course still looks ugly ;) (some syntax extension like
> allowing to specify IFN_COPYSIGN*8 would be nice here and easy
> enough to do)
>
> Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
> re-do it to only split the fors, keeping COPYSIGN and provide
> some statistics on the gimple-match-* size?  I think this might
> be the pragmatic solution for now.
>
> Richard - can you think of a clever way to express the desired
> iteration?  How do RTL macro iterations address cases like this?

I don't think .md files have an equivalent construct, unfortunately.
(I also regret some of the choices I made for .md iterators, but that's
another story.)

Perhaps an alternative to the *8 thing would be "IFN_COPYSIGN...",
with the "..." meaning "fill to match the longest operator list
in the loop".

Thanks,
Richard

> Richard.
>
>> +  (simplify
>> +   (coss (copysigns @0 @1))
>> +    (coss @0))))
>>  
>>  /* pow(copysign(x, y), z) -> pow(x, z) if z is an even integer.  */
>>  (for pows (POW)
>> -     copysigns (COPYSIGN)
>> - (simplify
>> -  (pows (copysigns @0 @2) REAL_CST@1)
>> -  (with { HOST_WIDE_INT n; }
>> -   (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
>> -    (pows @0 @1)))))
>> + (for copysigns (COPYSIGN_ALL)
>> +  (simplify
>> +   (pows (copysigns @0 @2) REAL_CST@1)
>> +   (with { HOST_WIDE_INT n; }
>> +    (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0)
>> +     (pows @0 @1))))))
>>  /* Likewise for powi.  */
>>  (for pows (POWI)
>> -     copysigns (COPYSIGN)
>> - (simplify
>> -  (pows (copysigns @0 @2) INTEGER_CST@1)
>> -  (if ((wi::to_wide (@1) & 1) == 0)
>> -   (pows @0 @1))))
>> + (for copysigns (COPYSIGN_ALL)
>> +  (simplify
>> +   (pows (copysigns @0 @2) INTEGER_CST@1)
>> +   (if ((wi::to_wide (@1) & 1) == 0)
>> +    (pows @0 @1)))))
>>  
>>  (for hypots (HYPOT)
>> -     copysigns (COPYSIGN)
>> - /* hypot(copysign(x, y), z) -> hypot(x, z).  */
>> - (simplify
>> -  (hypots (copysigns @0 @1) @2)
>> -  (hypots @0 @2))
>> - /* hypot(x, copysign(y, z)) -> hypot(x, y).  */
>> - (simplify
>> -  (hypots @0 (copysigns @1 @2))
>> -  (hypots @0 @1)))
>> + (for copysigns (COPYSIGN)
>> +  /* hypot(copysign(x, y), z) -> hypot(x, z).  */
>> +  (simplify
>> +   (hypots (copysigns @0 @1) @2)
>> +   (hypots @0 @2))
>> +  /* hypot(x, copysign(y, z)) -> hypot(x, y).  */
>> +  (simplify
>> +   (hypots @0 (copysigns @1 @2))
>> +   (hypots @0 @1))))
>>  
>> -/* copysign(x, CST) -> [-]abs (x).  */
>> -(for copysigns (COPYSIGN_ALL)
>> - (simplify
>> -  (copysigns @0 REAL_CST@1)
>> -  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
>> -   (negate (abs @0))
>> -   (abs @0))))
>> +/* Transform fneg (fabs (X)) -> copysign (X, -1).  */
>> +
>> +(simplify
>> + (negate (abs @0))
>> + (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
>>  
>>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>>  (for copysigns (COPYSIGN_ALL)
>> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
>> b/gcc/testsuite/gcc.dg/fold-copysign-1.c
>> index 
>> f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6
>>  100644
>> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
>> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
>> @@ -12,5 +12,5 @@ double bar (double x)
>>    return __builtin_copysign (x, minuszero);
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" } } */
>> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" } } */
>> +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
>> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */
>> diff --git a/gcc/testsuite/gcc.dg/pr55152-2.c 
>> b/gcc/testsuite/gcc.dg/pr55152-2.c
>> index 
>> 54db0f2062da105a829d6690ac8ed9891fe2b588..605f202ed6bc7aa8fe921457b02ff0b88cc63ce6
>>  100644
>> --- a/gcc/testsuite/gcc.dg/pr55152-2.c
>> +++ b/gcc/testsuite/gcc.dg/pr55152-2.c
>> @@ -10,4 +10,5 @@ int f(int a)
>>    return (a<-a)?a:-a;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "ABS_EXPR" 2 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "ABS_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>> index 
>> 6197519faf7b55aed7bc162cd0a14dd2145210ca..e1b825f37f69ac3c4666b3a52d733368805ad31d
>>  100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c
>> @@ -9,5 +9,6 @@ long double abs_ld(long double x) { return 
>> __builtin_signbit(x) ? x : -x; }
>>  
>>  /* __builtin_signbit(x) ? x : -x. Should be convert into - ABS_EXP<x> */
>>  /* { dg-final { scan-tree-dump-not "signbit" "optimized"} } */
>> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 3 "optimized"} } */
>> -/* { dg-final { scan-tree-dump-times "= -" 3 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "= -" 1 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized"} } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>> index 
>> 31f05716f1498dc709cac95fa20fb5796642c77e..c3a138642d6ff7be984e91fa1343cb2718db7ae1
>>  100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
>> @@ -26,5 +26,6 @@ TEST_FUNCTION (float, f)
>>  TEST_FUNCTION (double, )
>>  TEST_FUNCTION (long double, l)
>>  
>> -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" } } 
>> */
>> -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 
>> "backprop" } } */
>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" } } 
>> */
>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 
>> "backprop" } } */
>> +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 
>> "backprop" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>> index 
>> de52c5f7c8062958353d91f5031193defc9f3f91..e5d565c4b9832c00106588ef411fbd8c292a5cad
>>  100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c
>> @@ -10,4 +10,5 @@ float f1(float x)
>>    float t = __builtin_copysignf (1.0f, -x);
>>    return x * t;
>>  }
>> -/* { dg-final { scan-tree-dump-times "ABS" 2 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times "ABS" 1 "optimized"} } */
>> +/* { dg-final { scan-tree-dump-times ".COPYSIGN" 1 "optimized"} } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>> index 
>> a41f1baf25669a4fd301a586a49ba5e3c5b966ab..a22896b21c8b5a4d5d8e28bd8ae0db896e63ade0
>>  100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c
>> @@ -34,4 +34,5 @@ float i1(float x)
>>  {
>>    return x * (x <= 0.f ? 1.f : -1.f);
>>  }
>> -/* { dg-final { scan-tree-dump-times "ABS" 8 "gimple"} } */
>> +/* { dg-final { scan-tree-dump-times "ABS" 4 "gimple"} } */
>> +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 4 "gimple"} } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c 
>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..f823013c3ddf6b3a266c3abfcbf2642fc2a75fa6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <arm_neon.h>
>> +
>> +/*
>> +** t1:
>> +**  orr     v[0-9]+.2s, #128, lsl #24
>> +**  ret
>> +*/
>> +float32x2_t t1 (float32x2_t a)
>> +{
>> +  return vneg_f32 (vabs_f32 (a));
>> +}
>> +
>> +/*
>> +** t2:
>> +**  orr     v[0-9]+.4s, #128, lsl #24
>> +**  ret
>> +*/
>> +float32x4_t t2 (float32x4_t a)
>> +{
>> +  return vnegq_f32 (vabsq_f32 (a));
>> +}
>> +
>> +/*
>> +** t3:
>> +**  adrp    x0, .LC[0-9]+
>> +**  ldr     q[0-9]+, \[x0, #:lo12:.LC0\]
>> +**  orr     v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>> +**  ret
>> +*/
>> +float64x2_t t3 (float64x2_t a)
>> +{
>> +  return vnegq_f64 (vabsq_f64 (a));
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c 
>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..141121176b309e4b2aa413dc55271a6e3c93d5e1
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c
>> @@ -0,0 +1,31 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**  movi    v[0-9]+.2s, 0x80, lsl 24
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +float32_t f1 (float32_t a)
>> +{
>> +  return -fabsf (a);
>> +}
>> +
>> +/*
>> +** f2:
>> +**  mov     x0, -9223372036854775808
>> +**  fmov    d[0-9]+, x0
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +float64_t f2 (float64_t a)
>> +{
>> +  return -fabs (a);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c 
>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..b4652173a95d104ddfa70c497f0627a61ea89d3b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c
>> @@ -0,0 +1,36 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**  ...
>> +**  ldr     q[0-9]+, \[x0\]
>> +**  orr     v[0-9]+.4s, #128, lsl #24
>> +**  str     q[0-9]+, \[x0\], 16
>> +**  ...
>> +*/
>> +void f1 (float32_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabsf (a[i]);
>> +}
>> +
>> +/*
>> +** f2:
>> +**  ...
>> +**  ldr     q[0-9]+, \[x0\]
>> +**  orr     v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>> +**  str     q[0-9]+, \[x0\], 16
>> +**  ...
>> +*/
>> +void f2 (float64_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabs (a[i]);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c 
>> b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..10879dea74462d34b26160eeb0bd54ead063166b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#pragma GCC target "+nosve"
>> +
>> +#include <string.h>
>> +
>> +/*
>> +** negabs:
>> +**  mov     x0, -9223372036854775808
>> +**  fmov    d[0-9]+, x0
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +double negabs (double x)
>> +{
>> +   unsigned long long y;
>> +   memcpy (&y, &x, sizeof(double));
>> +   y = y | (1UL << 63);
>> +   memcpy (&x, &y, sizeof(double));
>> +   return x;
>> +}
>> +
>> +/*
>> +** negabsf:
>> +**  movi    v[0-9]+.2s, 0x80, lsl 24
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +float negabsf (float x)
>> +{
>> +   unsigned int y;
>> +   memcpy (&y, &x, sizeof(float));
>> +   y = y | (1U << 31);
>> +   memcpy (&x, &y, sizeof(float));
>> +   return x;
>> +}
>> +
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..0c7664e6de77a497682952653ffd417453854d52
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c
>> @@ -0,0 +1,37 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <arm_neon.h>
>> +
>> +/*
>> +** t1:
>> +**  orr     v[0-9]+.2s, #128, lsl #24
>> +**  ret
>> +*/
>> +float32x2_t t1 (float32x2_t a)
>> +{
>> +  return vneg_f32 (vabs_f32 (a));
>> +}
>> +
>> +/*
>> +** t2:
>> +**  orr     v[0-9]+.4s, #128, lsl #24
>> +**  ret
>> +*/
>> +float32x4_t t2 (float32x4_t a)
>> +{
>> +  return vnegq_f32 (vabsq_f32 (a));
>> +}
>> +
>> +/*
>> +** t3:
>> +**  adrp    x0, .LC[0-9]+
>> +**  ldr     q[0-9]+, \[x0, #:lo12:.LC0\]
>> +**  orr     v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b
>> +**  ret
>> +*/
>> +float64x2_t t3 (float64x2_t a)
>> +{
>> +  return vnegq_f64 (vabsq_f64 (a));
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..a60cd31b9294af2dac69eed1c93f899bd5c78fca
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**  movi    v[0-9]+.2s, 0x80, lsl 24
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +float32_t f1 (float32_t a)
>> +{
>> +  return -fabsf (a);
>> +}
>> +
>> +/*
>> +** f2:
>> +**  mov     x0, -9223372036854775808
>> +**  fmov    d[0-9]+, x0
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +float64_t f2 (float64_t a)
>> +{
>> +  return -fabs (a);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..1bf34328d8841de8e6b0a5458562a9f00e31c275
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c
>> @@ -0,0 +1,34 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <arm_neon.h>
>> +#include <math.h>
>> +
>> +/*
>> +** f1:
>> +**  ...
>> +**  ld1w    z[0-9]+.s, p[0-9]+/z, \[x0, x2, lsl 2\]
>> +**  orr     z[0-9]+.s, z[0-9]+.s, #0x80000000
>> +**  st1w    z[0-9]+.s, p[0-9]+, \[x0, x2, lsl 2\]
>> +**  ...
>> +*/
>> +void f1 (float32_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabsf (a[i]);
>> +}
>> +
>> +/*
>> +** f2:
>> +**  ...
>> +**  ld1d    z[0-9]+.d, p[0-9]+/z, \[x0, x2, lsl 3\]
>> +**  orr     z[0-9]+.d, z[0-9]+.d, #0x8000000000000000
>> +**  st1d    z[0-9]+.d, p[0-9]+, \[x0, x2, lsl 3\]
>> +**  ...
>> +*/
>> +void f2 (float64_t *a, int n)
>> +{
>> +  for (int i = 0; i < (n & -8); i++)
>> +   a[i] = -fabs (a[i]);
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
>> new file mode 100644
>> index 
>> 0000000000000000000000000000000000000000..21f2a8da2a5d44e3d01f6604ca7be87e3744d494
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c
>> @@ -0,0 +1,37 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
>> +
>> +#include <string.h>
>> +
>> +/*
>> +** negabs:
>> +**  mov     x0, -9223372036854775808
>> +**  fmov    d[0-9]+, x0
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +double negabs (double x)
>> +{
>> +   unsigned long long y;
>> +   memcpy (&y, &x, sizeof(double));
>> +   y = y | (1UL << 63);
>> +   memcpy (&x, &y, sizeof(double));
>> +   return x;
>> +}
>> +
>> +/*
>> +** negabsf:
>> +**  movi    v[0-9]+.2s, 0x80, lsl 24
>> +**  orr     v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b
>> +**  ret
>> +*/
>> +float negabsf (float x)
>> +{
>> +   unsigned int y;
>> +   memcpy (&y, &x, sizeof(float));
>> +   y = y | (1U << 31);
>> +   memcpy (&x, &y, sizeof(float));
>> +   return x;
>> +}
>> +
>>

Re: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

Reply via email to