> On 21 Oct 2024, at 10:51, Richard Biener <rguent...@suse.de> wrote: > > External email: Use caution opening links or attachments > > > On Fri, 18 Oct 2024, Jennifer Schmitz wrote: > >> This patch adds the following two simplifications in match.pd: >> - pow (1.0/x, y) to pow (x, -y), avoiding the division >> - pow (0.0, x) to 0.0, avoiding the call to pow. >> The patterns are guarded by flag_unsafe_math_optimizations, >> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS, >> and !HONOR_INFINITIES. >> >> Tests were added to confirm the application of the transform for float, >> double, and long double. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu and >> x86_64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and >> pow (0.0, x) -> 0.0. >> >> gcc/testsuite/ >> * gcc.dg/tree-ssa/pow_fold_1.c: New test. >> --- >> gcc/match.pd | 14 +++++++++ >> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++ >> 2 files changed, 48 insertions(+) >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> >> diff --git a/gcc/match.pd b/gcc/match.pd >> index 12d81fcac0d..ba100b117e7 100644 >> --- a/gcc/match.pd >> +++ b/gcc/match.pd >> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> (rdiv @0 (exps:s @1)) >> (mult @0 (exps (negate @1))))) >> >> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ >> + (if (! HONOR_INFINITIES (type) >> + && ! HONOR_SIGNED_ZEROS (type) >> + && ! flag_trapping_math >> + && ! flag_errno_math) >> + (simplify >> + (POW (rdiv:s real_onep@0 @1) @2) >> + (POW @1 (negate @2))) > > This one shouldn't need HONOR_SIGNED_ZEROS? > >> + >> + /* Simplify pow(0.0, x) into 0.0. */ >> + (simplify >> + (POW real_zerop@0 @1) > > I think this needs !HONOR_NANS (type)? > > Otherwise OK. Thanks for the feedback, Richard and Andrew. I made the following changes to the patch (current version of the patch below): - also applied the pattern to POWI and added tests for pow, powif, powil - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one additionally under !HONOR_NANS (type) - added tests for powf16
Now, I am encountering two problems: First, the transform is not applied for float16 (even if -fexcess-precision=16). Do you know what the problem could be? Second, validation on aarch64 shows a regression in tests - gcc.dg/recip_sqrt_mult_1.c and - gcc.dg/recip_sqrt_mult_5.c, because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the recip pass and prevents application of the recip-patterns. The reason for this might be that the single-use restriction only work if the integer argument is non-constant, but in the failing test cases, the integer argument is 2 and the pattern is applied despite the :s flag. For example, my pattern is **not** applied (single-use restriction works) for: double res, res2; void foo (double a, int b) { double f (double); double t1 = 1.0 / a; res = __builtin_powi (t1, b); res2 = f (t1); } But the pattern **is** applied and single-use restriction does **not** work for: double res, res2; void foo (double a) { double f (double); double t1 = 1.0 / a; res = __builtin_powi (t1, 2); res2 = f (t1); } Possible options to resolve this are: - gate pattern to run after recip pass - do not apply pattern for POWI What are your thoughts on this? Thanks, Jennifer This patch adds the following two simplifications in match.pd for POW and POWI: - pow (1.0/x, y) to pow (x, -y), avoiding the division - pow (0.0, x) to 0.0, avoiding the call to pow. The patterns are guarded by flag_unsafe_math_optimizations, !flag_trapping_math, !flag_errno_math, and !HONOR_INFINITIES. The second pattern is also guarded by !HONOR_NANS and !HONOR_SIGNED_ZEROS. Tests were added to confirm the application of the transform for builtins pow, powf, powl, powi, powif, powil, and powf16. The patch was bootstrapped and regtested on aarch64-linux-gnu and x86_64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> gcc/ * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and pow (0.0, x) -> 0.0. gcc/testsuite/ * gcc.dg/tree-ssa/pow_fold_1.c: New test. --- gcc/match.pd | 15 ++++++++ gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++ 2 files changed, 57 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c diff --git a/gcc/match.pd b/gcc/match.pd index 12d81fcac0d..b061ef9dc91 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (rdiv @0 (exps:s @1)) (mult @0 (exps (negate @1))))) + (for pow (POW POWI) + (if (! HONOR_INFINITIES (type) + && ! flag_trapping_math + && ! flag_errno_math) + /* Simplify pow(1.0/x, y) into pow(x, -y). */ + (simplify + (pow (rdiv:s real_onep@0 @1) @2) + (pow @1 (negate @2))) + + /* Simplify pow(0.0, x) into 0.0. */ + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type)) + (simplify + (pow real_zerop@0 @1) + @0)))) + (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) && ! flag_trapping_math diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c new file mode 100644 index 00000000000..c38b7390478 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */ +/* { dg-add-options float16 } */ +/* { dg-require-effective-target float16_runtime } */ +/* { dg-require-effective-target c99_runtime } */ + +extern void link_error (void); + +#define POW1OVER(TYPE1, TYPE2, CTY, TY) \ + void \ + pow1over_##TY (TYPE1 x, TYPE2 y) \ + { \ + TYPE1 t1 = 1.0##CTY / x; \ + TYPE1 t2 = __builtin_pow##TY (t1, y); \ + TYPE2 t3 = -y; \ + TYPE1 t4 = __builtin_pow##TY (x, t3); \ + if (t2 != t4) \ + link_error (); \ + } \ + +#define POW0(TYPE1, TYPE2, CTY, TY) \ + void \ + pow0_##TY (TYPE2 x) \ + { \ + TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x); \ + if (t1 != 0.0##CTY) \ + link_error (); \ + } \ + +#define TEST_ALL(TYPE1, TYPE2, CTY, TY) \ + POW1OVER (TYPE1, TYPE2, CTY, TY) \ + POW0 (TYPE1, TYPE2, CTY, TY) + +TEST_ALL (double, double, , ) +TEST_ALL (float, float, f, f) +TEST_ALL (_Float16, _Float16, f16, f16) +TEST_ALL (long double, long double, L, l) +TEST_ALL (double, int, , i) +TEST_ALL (float, int, f, if) +TEST_ALL (long double, int, L, il) + +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */ -- 2.44.0 > > Richard. > >> + @0)) >> + >> (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) >> && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) >> && ! flag_trapping_math >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> new file mode 100644 >> index 00000000000..113df572661 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> @@ -0,0 +1,34 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -ffast-math" } */ >> +/* { dg-require-effective-target c99_runtime } */ >> + >> +extern void link_error (void); >> + >> +#define POW1OVER(TYPE, C_TY, TY) \ >> + void \ >> + pow1over_##TY (TYPE x, TYPE y) \ >> + { \ >> + TYPE t1 = 1.0##C_TY / x; \ >> + TYPE t2 = __builtin_pow##TY (t1, y); \ >> + TYPE t3 = -y; \ >> + TYPE t4 = __builtin_pow##TY (x, t3); \ >> + if (t2 != t4) \ >> + link_error (); \ >> + } \ >> + >> +#define POW0(TYPE, C_TY, TY) \ >> + void \ >> + pow0_##TY (TYPE x) \ >> + { \ >> + TYPE t1 = __builtin_pow##TY (0.0##C_TY, x); \ >> + if (t1 != 0.0##C_TY) \ >> + link_error (); \ >> + } \ >> + >> +#define TEST_ALL(TYPE, C_TY, TY) \ >> + POW1OVER (TYPE, C_TY, TY) \ >> + POW0 (TYPE, C_TY, TY) >> + >> +TEST_ALL (double, , ) >> +TEST_ALL (float, f, f) >> +TEST_ALL (long double, L, l) >> > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
smime.p7s
Description: S/MIME cryptographic signature