On Thu, 5 Oct 2023, Tamar Christina wrote: > > I suppose the idea is that -abs(x) might be easier to optimize with other > > patterns (consider a - copysign(x,...), optimizing to a + abs(x)). > > > > For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less > > canonical than copysign. > > > > > Should I try removing this? > > > > I'd say yes (and put the reverse canonicalization next to this pattern). > > > > This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more > canonical and allows a target to expand this sequence efficiently. Such > sequences are common in scientific code working with gradients. > > various optimizations in match.pd only happened on COPYSIGN but not > COPYSIGN_ALL > which means they exclude IFN_COPYSIGN. COPYSIGN however is restricted to only
That's not true: (define_operator_list COPYSIGN BUILT_IN_COPYSIGNF BUILT_IN_COPYSIGN BUILT_IN_COPYSIGNL IFN_COPYSIGN) but they miss the extended float builtin variants like __builtin_copysignf16. Also see below > the C99 builtins and so doesn't work for vectors. > > The patch expands these optimizations to work on COPYSIGN_ALL. > > There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x)) > which I remove since this is a less efficient form. The testsuite is also > updated in light of this. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/109154 > * match.pd: Add new neg+abs rule, remove inverse copysign rule and > expand existing copysign optimizations. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/109154 > * gcc.dg/fold-copysign-1.c: Updated. > * gcc.dg/pr55152-2.c: Updated. > * gcc.dg/tree-ssa/abs-4.c: Updated. > * gcc.dg/tree-ssa/backprop-6.c: Updated. > * gcc.dg/tree-ssa/copy-sign-2.c: Updated. > * gcc.dg/tree-ssa/mult-abs-2.c: Updated. > * gcc.target/aarch64/fneg-abs_1.c: New test. > * gcc.target/aarch64/fneg-abs_2.c: New test. > * gcc.target/aarch64/fneg-abs_3.c: New test. > * gcc.target/aarch64/fneg-abs_4.c: New test. > * gcc.target/aarch64/sve/fneg-abs_1.c: New test. > * gcc.target/aarch64/sve/fneg-abs_2.c: New test. > * gcc.target/aarch64/sve/fneg-abs_3.c: New test. > * gcc.target/aarch64/sve/fneg-abs_4.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/match.pd b/gcc/match.pd > index > 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2 > 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > /* cos(copysign(x, y)) -> cos(x). Similarly for cosh. */ > (for coss (COS COSH) > - copysigns (COPYSIGN) > - (simplify > - (coss (copysigns @0 @1)) > - (coss @0))) > + (for copysigns (COPYSIGN_ALL) So this ends up generating for example the match (cosf (copysignl ...)) which doesn't make much sense. The lock-step iteration did (cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...)) which is leaner but misses the case of (cosf (ifn_copysign ..)) - that's probably what you are after with this change. That said, there isn't a nice solution (without altering the match.pd IL). There's the explicit solution, spelling out all combinations. So if we want to go with yout pragmatic solution changing this to use COPYSIGN_ALL isn't necessary, only changing the lock-step for iteration to a cross product for iteration is. Changing just this pattern to (for coss (COS COSH) (for copysigns (COPYSIGN) (simplify (coss (copysigns @0 @1)) (coss @0)))) increases the total number of gimple-match-x.cc lines from 234988 to 235324. The alternative is to do (for coss (COS COSH) copysigns (COPYSIGN) (simplify (coss (copysigns @0 @1)) (coss @0)) (simplify (coss (IFN_COPYSIGN @0 @1)) (coss @0))) which properly will diagnose a duplicate pattern. Ther are currently no operator lists with just builtins defined (that could be fixed, see gencfn-macros.cc), supposed we'd have COS_C we could do (for coss (COS_C COSH_C IFN_COS IFN_COSH) copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN) (simplify (coss (copysigns @0 @1)) (coss @0))) which of course still looks ugly ;) (some syntax extension like allowing to specify IFN_COPYSIGN*8 would be nice here and easy enough to do) Can you split out the part changing COPYSIGN to COPYSIGN_ALL, re-do it to only split the fors, keeping COPYSIGN and provide some statistics on the gimple-match-* size? I think this might be the pragmatic solution for now. Richard - can you think of a clever way to express the desired iteration? How do RTL macro iterations address cases like this? Richard. > + (simplify > + (coss (copysigns @0 @1)) > + (coss @0)))) > > /* pow(copysign(x, y), z) -> pow(x, z) if z is an even integer. */ > (for pows (POW) > - copysigns (COPYSIGN) > - (simplify > - (pows (copysigns @0 @2) REAL_CST@1) > - (with { HOST_WIDE_INT n; } > - (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0) > - (pows @0 @1))))) > + (for copysigns (COPYSIGN_ALL) > + (simplify > + (pows (copysigns @0 @2) REAL_CST@1) > + (with { HOST_WIDE_INT n; } > + (if (real_isinteger (&TREE_REAL_CST (@1), &n) && (n & 1) == 0) > + (pows @0 @1)))))) > /* Likewise for powi. */ > (for pows (POWI) > - copysigns (COPYSIGN) > - (simplify > - (pows (copysigns @0 @2) INTEGER_CST@1) > - (if ((wi::to_wide (@1) & 1) == 0) > - (pows @0 @1)))) > + (for copysigns (COPYSIGN_ALL) > + (simplify > + (pows (copysigns @0 @2) INTEGER_CST@1) > + (if ((wi::to_wide (@1) & 1) == 0) > + (pows @0 @1))))) > > (for hypots (HYPOT) > - copysigns (COPYSIGN) > - /* hypot(copysign(x, y), z) -> hypot(x, z). */ > - (simplify > - (hypots (copysigns @0 @1) @2) > - (hypots @0 @2)) > - /* hypot(x, copysign(y, z)) -> hypot(x, y). */ > - (simplify > - (hypots @0 (copysigns @1 @2)) > - (hypots @0 @1))) > + (for copysigns (COPYSIGN) > + /* hypot(copysign(x, y), z) -> hypot(x, z). */ > + (simplify > + (hypots (copysigns @0 @1) @2) > + (hypots @0 @2)) > + /* hypot(x, copysign(y, z)) -> hypot(x, y). */ > + (simplify > + (hypots @0 (copysigns @1 @2)) > + (hypots @0 @1)))) > > -/* copysign(x, CST) -> [-]abs (x). */ > -(for copysigns (COPYSIGN_ALL) > - (simplify > - (copysigns @0 REAL_CST@1) > - (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1))) > - (negate (abs @0)) > - (abs @0)))) > +/* Transform fneg (fabs (X)) -> copysign (X, -1). */ > + > +(simplify > + (negate (abs @0)) > + (IFN_COPYSIGN @0 { build_minus_one_cst (type); })) > > /* copysign(copysign(x, y), z) -> copysign(x, z). */ > (for copysigns (COPYSIGN_ALL) > diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c > b/gcc/testsuite/gcc.dg/fold-copysign-1.c > index > f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6 > 100644 > --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c > +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c > @@ -12,5 +12,5 @@ double bar (double x) > return __builtin_copysign (x, minuszero); > } > > -/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" } } */ > -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" } } */ > +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */ > +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */ > diff --git a/gcc/testsuite/gcc.dg/pr55152-2.c > b/gcc/testsuite/gcc.dg/pr55152-2.c > index > 54db0f2062da105a829d6690ac8ed9891fe2b588..605f202ed6bc7aa8fe921457b02ff0b88cc63ce6 > 100644 > --- a/gcc/testsuite/gcc.dg/pr55152-2.c > +++ b/gcc/testsuite/gcc.dg/pr55152-2.c > @@ -10,4 +10,5 @@ int f(int a) > return (a<-a)?a:-a; > } > > -/* { dg-final { scan-tree-dump-times "ABS_EXPR" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "ABS_EXPR" 1 "optimized" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c > b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c > index > 6197519faf7b55aed7bc162cd0a14dd2145210ca..e1b825f37f69ac3c4666b3a52d733368805ad31d > 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/abs-4.c > @@ -9,5 +9,6 @@ long double abs_ld(long double x) { return > __builtin_signbit(x) ? x : -x; } > > /* __builtin_signbit(x) ? x : -x. Should be convert into - ABS_EXP<x> */ > /* { dg-final { scan-tree-dump-not "signbit" "optimized"} } */ > -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 3 "optimized"} } */ > -/* { dg-final { scan-tree-dump-times "= -" 3 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "= -" 1 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "= \.COPYSIGN" 2 "optimized"} } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c > b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c > index > 31f05716f1498dc709cac95fa20fb5796642c77e..c3a138642d6ff7be984e91fa1343cb2718db7ae1 > 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c > @@ -26,5 +26,6 @@ TEST_FUNCTION (float, f) > TEST_FUNCTION (double, ) > TEST_FUNCTION (long double, l) > > -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 6 "backprop" } } */ > -/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 3 > "backprop" } } */ > +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = -} 4 "backprop" } } */ > +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = \.COPYSIGN} 2 > "backprop" } } */ > +/* { dg-final { scan-tree-dump-times {Deleting[^\n]* = ABS_EXPR <} 1 > "backprop" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c > index > de52c5f7c8062958353d91f5031193defc9f3f91..e5d565c4b9832c00106588ef411fbd8c292a5cad > 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-sign-2.c > @@ -10,4 +10,5 @@ float f1(float x) > float t = __builtin_copysignf (1.0f, -x); > return x * t; > } > -/* { dg-final { scan-tree-dump-times "ABS" 2 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times "ABS" 1 "optimized"} } */ > +/* { dg-final { scan-tree-dump-times ".COPYSIGN" 1 "optimized"} } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c > index > a41f1baf25669a4fd301a586a49ba5e3c5b966ab..a22896b21c8b5a4d5d8e28bd8ae0db896e63ade0 > 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/mult-abs-2.c > @@ -34,4 +34,5 @@ float i1(float x) > { > return x * (x <= 0.f ? 1.f : -1.f); > } > -/* { dg-final { scan-tree-dump-times "ABS" 8 "gimple"} } */ > +/* { dg-final { scan-tree-dump-times "ABS" 4 "gimple"} } */ > +/* { dg-final { scan-tree-dump-times "\.COPYSIGN" 4 "gimple"} } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..f823013c3ddf6b3a266c3abfcbf2642fc2a75fa6 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_1.c > @@ -0,0 +1,39 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#pragma GCC target "+nosve" > + > +#include <arm_neon.h> > + > +/* > +** t1: > +** orr v[0-9]+.2s, #128, lsl #24 > +** ret > +*/ > +float32x2_t t1 (float32x2_t a) > +{ > + return vneg_f32 (vabs_f32 (a)); > +} > + > +/* > +** t2: > +** orr v[0-9]+.4s, #128, lsl #24 > +** ret > +*/ > +float32x4_t t2 (float32x4_t a) > +{ > + return vnegq_f32 (vabsq_f32 (a)); > +} > + > +/* > +** t3: > +** adrp x0, .LC[0-9]+ > +** ldr q[0-9]+, \[x0, #:lo12:.LC0\] > +** orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b > +** ret > +*/ > +float64x2_t t3 (float64x2_t a) > +{ > + return vnegq_f64 (vabsq_f64 (a)); > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..141121176b309e4b2aa413dc55271a6e3c93d5e1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_2.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#pragma GCC target "+nosve" > + > +#include <arm_neon.h> > +#include <math.h> > + > +/* > +** f1: > +** movi v[0-9]+.2s, 0x80, lsl 24 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +float32_t f1 (float32_t a) > +{ > + return -fabsf (a); > +} > + > +/* > +** f2: > +** mov x0, -9223372036854775808 > +** fmov d[0-9]+, x0 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +float64_t f2 (float64_t a) > +{ > + return -fabs (a); > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..b4652173a95d104ddfa70c497f0627a61ea89d3b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_3.c > @@ -0,0 +1,36 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#pragma GCC target "+nosve" > + > +#include <arm_neon.h> > +#include <math.h> > + > +/* > +** f1: > +** ... > +** ldr q[0-9]+, \[x0\] > +** orr v[0-9]+.4s, #128, lsl #24 > +** str q[0-9]+, \[x0\], 16 > +** ... > +*/ > +void f1 (float32_t *a, int n) > +{ > + for (int i = 0; i < (n & -8); i++) > + a[i] = -fabsf (a[i]); > +} > + > +/* > +** f2: > +** ... > +** ldr q[0-9]+, \[x0\] > +** orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b > +** str q[0-9]+, \[x0\], 16 > +** ... > +*/ > +void f2 (float64_t *a, int n) > +{ > + for (int i = 0; i < (n & -8); i++) > + a[i] = -fabs (a[i]); > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c > b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..10879dea74462d34b26160eeb0bd54ead063166b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/fneg-abs_4.c > @@ -0,0 +1,39 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#pragma GCC target "+nosve" > + > +#include <string.h> > + > +/* > +** negabs: > +** mov x0, -9223372036854775808 > +** fmov d[0-9]+, x0 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +double negabs (double x) > +{ > + unsigned long long y; > + memcpy (&y, &x, sizeof(double)); > + y = y | (1UL << 63); > + memcpy (&x, &y, sizeof(double)); > + return x; > +} > + > +/* > +** negabsf: > +** movi v[0-9]+.2s, 0x80, lsl 24 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +float negabsf (float x) > +{ > + unsigned int y; > + memcpy (&y, &x, sizeof(float)); > + y = y | (1U << 31); > + memcpy (&x, &y, sizeof(float)); > + return x; > +} > + > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..0c7664e6de77a497682952653ffd417453854d52 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_1.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#include <arm_neon.h> > + > +/* > +** t1: > +** orr v[0-9]+.2s, #128, lsl #24 > +** ret > +*/ > +float32x2_t t1 (float32x2_t a) > +{ > + return vneg_f32 (vabs_f32 (a)); > +} > + > +/* > +** t2: > +** orr v[0-9]+.4s, #128, lsl #24 > +** ret > +*/ > +float32x4_t t2 (float32x4_t a) > +{ > + return vnegq_f32 (vabsq_f32 (a)); > +} > + > +/* > +** t3: > +** adrp x0, .LC[0-9]+ > +** ldr q[0-9]+, \[x0, #:lo12:.LC0\] > +** orr v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b > +** ret > +*/ > +float64x2_t t3 (float64x2_t a) > +{ > + return vnegq_f64 (vabsq_f64 (a)); > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..a60cd31b9294af2dac69eed1c93f899bd5c78fca > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_2.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#include <arm_neon.h> > +#include <math.h> > + > +/* > +** f1: > +** movi v[0-9]+.2s, 0x80, lsl 24 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +float32_t f1 (float32_t a) > +{ > + return -fabsf (a); > +} > + > +/* > +** f2: > +** mov x0, -9223372036854775808 > +** fmov d[0-9]+, x0 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +float64_t f2 (float64_t a) > +{ > + return -fabs (a); > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..1bf34328d8841de8e6b0a5458562a9f00e31c275 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_3.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#include <arm_neon.h> > +#include <math.h> > + > +/* > +** f1: > +** ... > +** ld1w z[0-9]+.s, p[0-9]+/z, \[x0, x2, lsl 2\] > +** orr z[0-9]+.s, z[0-9]+.s, #0x80000000 > +** st1w z[0-9]+.s, p[0-9]+, \[x0, x2, lsl 2\] > +** ... > +*/ > +void f1 (float32_t *a, int n) > +{ > + for (int i = 0; i < (n & -8); i++) > + a[i] = -fabsf (a[i]); > +} > + > +/* > +** f2: > +** ... > +** ld1d z[0-9]+.d, p[0-9]+/z, \[x0, x2, lsl 3\] > +** orr z[0-9]+.d, z[0-9]+.d, #0x8000000000000000 > +** st1d z[0-9]+.d, p[0-9]+, \[x0, x2, lsl 3\] > +** ... > +*/ > +void f2 (float64_t *a, int n) > +{ > + for (int i = 0; i < (n & -8); i++) > + a[i] = -fabs (a[i]); > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c > b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..21f2a8da2a5d44e3d01f6604ca7be87e3744d494 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_4.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ > + > +#include <string.h> > + > +/* > +** negabs: > +** mov x0, -9223372036854775808 > +** fmov d[0-9]+, x0 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +double negabs (double x) > +{ > + unsigned long long y; > + memcpy (&y, &x, sizeof(double)); > + y = y | (1UL << 63); > + memcpy (&x, &y, sizeof(double)); > + return x; > +} > + > +/* > +** negabsf: > +** movi v[0-9]+.2s, 0x80, lsl 24 > +** orr v[0-9]+.8b, v[0-9]+.8b, v[0-9]+.8b > +** ret > +*/ > +float negabsf (float x) > +{ > + unsigned int y; > + memcpy (&y, &x, sizeof(float)); > + y = y | (1U << 31); > + memcpy (&x, &y, sizeof(float)); > + return x; > +} > + > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)