Re: [PATCH] match.pd: Add std::pow folding optimizations.

Jennifer Schmitz Tue, 22 Oct 2024 01:47:18 -0700


> On 21 Oct 2024, at 10:51, Richard Biener <rguent...@suse.de> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, 18 Oct 2024, Jennifer Schmitz wrote:
> 
>> This patch adds the following two simplifications in match.pd:
>> - pow (1.0/x, y) to pow (x, -y), avoiding the division
>> - pow (0.0, x) to 0.0, avoiding the call to pow.
>> The patterns are guarded by flag_unsafe_math_optimizations,
>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS,
>> and !HONOR_INFINITIES.
>> 
>> Tests were added to confirm the application of the transform for float,
>> double, and long double.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>> x86_64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/
>>      * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
>>      pow (0.0, x) -> 0.0.
>> 
>> gcc/testsuite/
>>      * gcc.dg/tree-ssa/pow_fold_1.c: New test.
>> ---
>> gcc/match.pd                               | 14 +++++++++
>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++
>> 2 files changed, 48 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 12d81fcac0d..ba100b117e7 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>    (rdiv @0 (exps:s @1))
>>     (mult @0 (exps (negate @1)))))
>> 
>> + /* Simplify pow(1.0/x, y) into pow(x, -y).  */
>> + (if (! HONOR_INFINITIES (type)
>> +      && ! HONOR_SIGNED_ZEROS (type)
>> +      && ! flag_trapping_math
>> +      && ! flag_errno_math)
>> +  (simplify
>> +   (POW (rdiv:s real_onep@0 @1) @2)
>> +    (POW @1 (negate @2)))
> 
> This one shouldn't need HONOR_SIGNED_ZEROS?
> 
>> +
>> +  /* Simplify pow(0.0, x) into 0.0.  */
>> +  (simplify
>> +   (POW real_zerop@0 @1)
> 
> I think this needs !HONOR_NANS (type)?
> 
> Otherwise OK.
Thanks for the feedback, Richard and Andrew. I made the following changes to 
the patch (current version of the patch below):
- also applied the pattern to POWI and added tests for pow, powif, powil
- not gate first pattern under !HONOR_SIGNED_ZEROS, but second one additionally 
under !HONOR_NANS (type)
- added tests for powf16


Now, I am encountering two problems:

First, the transform is not applied for float16 (even if 
-fexcess-precision=16). Do you know what the problem could be?

Second, validation on aarch64 shows a regression in tests
  - gcc.dg/recip_sqrt_mult_1.c and 
  - gcc.dg/recip_sqrt_mult_5.c,
because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the recip 
pass and prevents application of the recip-patterns. The reason for this might 
be that the single-use restriction only work if the integer argument is 
non-constant, but in the failing test cases, the integer argument is 2 and the 
pattern is applied despite the :s flag.
For example, my pattern is **not** applied (single-use restriction works) for:
double res, res2;
void foo (double a, int b)
{
  double f (double);
  double t1 = 1.0 / a;
  res = __builtin_powi (t1, b);
res2 = f (t1);
}

But the pattern **is** applied and single-use restriction does **not** work for:
double res, res2;
void foo (double a)
{
  double f (double);
  double t1 = 1.0 / a;
  res = __builtin_powi (t1, 2);
  res2 = f (t1);
}

Possible options to resolve this are:
  - gate pattern to run after recip pass
  - do not apply pattern for POWI
What are your thoughts on this?
Thanks,
Jennifer

This patch adds the following two simplifications in match.pd for POW
and POWI:
- pow (1.0/x, y) to pow (x, -y), avoiding the division
- pow (0.0, x) to 0.0, avoiding the call to pow.
The patterns are guarded by flag_unsafe_math_optimizations,
!flag_trapping_math, !flag_errno_math, and !HONOR_INFINITIES.
The second pattern is also guarded by !HONOR_NANS and
!HONOR_SIGNED_ZEROS.

Tests were added to confirm the application of the transform for
builtins pow, powf, powl, powi, powif, powil, and powf16.

The patch was bootstrapped and regtested on aarch64-linux-gnu and
x86_64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>

gcc/
        * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
        pow (0.0, x) -> 0.0.

gcc/testsuite/
        * gcc.dg/tree-ssa/pow_fold_1.c: New test.
---
 gcc/match.pd                               | 15 ++++++++
 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 12d81fcac0d..b061ef9dc91 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (rdiv @0 (exps:s @1))
     (mult @0 (exps (negate @1)))))
 
+ (for pow (POW POWI)
+  (if (! HONOR_INFINITIES (type)
+       && ! flag_trapping_math
+       && ! flag_errno_math)
+   /* Simplify pow(1.0/x, y) into pow(x, -y).  */
+   (simplify
+    (pow (rdiv:s real_onep@0 @1) @2)
+     (pow @1 (negate @2)))
+
+   /* Simplify pow(0.0, x) into 0.0.  */
+   (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type))
+    (simplify
+     (pow real_zerop@0 @1)
+      @0))))
+
  (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type)
       && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)
       && ! flag_trapping_math
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
new file mode 100644
index 00000000000..c38b7390478
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+/* { dg-add-options float16 } */
+/* { dg-require-effective-target float16_runtime } */
+/* { dg-require-effective-target c99_runtime } */
+
+extern void link_error (void);
+
+#define POW1OVER(TYPE1, TYPE2, CTY, TY)                        \
+  void                                                 \
+  pow1over_##TY (TYPE1 x, TYPE2 y)                     \
+  {                                                    \
+    TYPE1 t1 = 1.0##CTY / x;                           \
+    TYPE1 t2 = __builtin_pow##TY (t1, y);              \
+    TYPE2 t3 = -y;                                     \
+    TYPE1 t4 = __builtin_pow##TY (x, t3);              \
+    if (t2 != t4)                                      \
+      link_error ();                                   \
+  }                                                    \
+
+#define POW0(TYPE1, TYPE2, CTY, TY)                    \
+  void                                                 \
+  pow0_##TY (TYPE2 x)                                  \
+  {                                                    \
+    TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x);                \
+    if (t1 != 0.0##CTY)                                        \
+      link_error ();                                   \
+  }                                                    \
+
+#define TEST_ALL(TYPE1, TYPE2, CTY, TY)                        \
+  POW1OVER (TYPE1, TYPE2, CTY, TY)                     \
+  POW0 (TYPE1, TYPE2, CTY, TY)
+
+TEST_ALL (double, double, , )
+TEST_ALL (float, float, f, f)
+TEST_ALL (_Float16, _Float16, f16, f16)
+TEST_ALL (long double, long double, L, l)
+TEST_ALL (double, int, , i)
+TEST_ALL (float, int, f, if)
+TEST_ALL (long double, int, L, il)
+
+/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */
-- 
2.44.0

> 
> Richard.
> 
>> +    @0))
>> +
>>  (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type)
>>       && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)
>>       && ! flag_trapping_math
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> new file mode 100644
>> index 00000000000..113df572661
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> @@ -0,0 +1,34 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-require-effective-target c99_runtime } */
>> +
>> +extern void link_error (void);
>> +
>> +#define POW1OVER(TYPE, C_TY, TY)                     \
>> +  void                                                       \
>> +  pow1over_##TY (TYPE x, TYPE y)                     \
>> +  {                                                  \
>> +    TYPE t1 = 1.0##C_TY / x;                         \
>> +    TYPE t2 = __builtin_pow##TY (t1, y);             \
>> +    TYPE t3 = -y;                                    \
>> +    TYPE t4 = __builtin_pow##TY (x, t3);             \
>> +    if (t2 != t4)                                    \
>> +      link_error ();                                 \
>> +  }                                                  \
>> +
>> +#define POW0(TYPE, C_TY, TY)                         \
>> +  void                                                       \
>> +  pow0_##TY (TYPE x)                                 \
>> +  {                                                  \
>> +    TYPE t1 = __builtin_pow##TY (0.0##C_TY, x);              \
>> +    if (t1 != 0.0##C_TY)                             \
>> +      link_error ();                                 \
>> +  }                                                  \
>> +
>> +#define TEST_ALL(TYPE, C_TY, TY)                     \
>> +  POW1OVER (TYPE, C_TY, TY)                          \
>> +  POW0 (TYPE, C_TY, TY)
>> +
>> +TEST_ALL (double, , )
>> +TEST_ALL (float, f, f)
>> +TEST_ALL (long double, L, l)
>> 
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] match.pd: Add std::pow folding optimizations.

Reply via email to