> On 25 Oct 2024, at 14:39, Richard Biener <rguent...@suse.de> wrote: > > External email: Use caution opening links or attachments > > > On Wed, 23 Oct 2024, Jennifer Schmitz wrote: > >> >> >>> On 22 Oct 2024, at 13:14, Richard Biener <rguent...@suse.de> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> On Tue, 22 Oct 2024, Jennifer Schmitz wrote: >>> >>>> >>>> >>>>> On 22 Oct 2024, at 11:05, Richard Biener <rguent...@suse.de> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On Tue, 22 Oct 2024, Jennifer Schmitz wrote: >>>>> >>>>>> >>>>>> >>>>>>> On 21 Oct 2024, at 10:51, Richard Biener <rguent...@suse.de> wrote: >>>>>>> >>>>>>> External email: Use caution opening links or attachments >>>>>>> >>>>>>> >>>>>>> On Fri, 18 Oct 2024, Jennifer Schmitz wrote: >>>>>>> >>>>>>>> This patch adds the following two simplifications in match.pd: >>>>>>>> - pow (1.0/x, y) to pow (x, -y), avoiding the division >>>>>>>> - pow (0.0, x) to 0.0, avoiding the call to pow. >>>>>>>> The patterns are guarded by flag_unsafe_math_optimizations, >>>>>>>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS, >>>>>>>> and !HONOR_INFINITIES. >>>>>>>> >>>>>>>> Tests were added to confirm the application of the transform for float, >>>>>>>> double, and long double. >>>>>>>> >>>>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and >>>>>>>> x86_64-linux-gnu, no regression. >>>>>>>> OK for mainline? >>>>>>>> >>>>>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>>>>>> >>>>>>>> gcc/ >>>>>>>> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and >>>>>>>> pow (0.0, x) -> 0.0. >>>>>>>> >>>>>>>> gcc/testsuite/ >>>>>>>> * gcc.dg/tree-ssa/pow_fold_1.c: New test. >>>>>>>> --- >>>>>>>> gcc/match.pd | 14 +++++++++ >>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++ >>>>>>>> 2 files changed, 48 insertions(+) >>>>>>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >>>>>>>> >>>>>>>> diff --git a/gcc/match.pd b/gcc/match.pd >>>>>>>> index 12d81fcac0d..ba100b117e7 100644 >>>>>>>> --- a/gcc/match.pd >>>>>>>> +++ b/gcc/match.pd >>>>>>>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >>>>>>>> (rdiv @0 (exps:s @1)) >>>>>>>> (mult @0 (exps (negate @1))))) >>>>>>>> >>>>>>>> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ >>>>>>>> + (if (! HONOR_INFINITIES (type) >>>>>>>> + && ! HONOR_SIGNED_ZEROS (type) >>>>>>>> + && ! flag_trapping_math >>>>>>>> + && ! flag_errno_math) >>>>>>>> + (simplify >>>>>>>> + (POW (rdiv:s real_onep@0 @1) @2) >>>>>>>> + (POW @1 (negate @2))) >>>>>>> >>>>>>> This one shouldn't need HONOR_SIGNED_ZEROS? >>>>>>> >>>>>>>> + >>>>>>>> + /* Simplify pow(0.0, x) into 0.0. */ >>>>>>>> + (simplify >>>>>>>> + (POW real_zerop@0 @1) >>>>>>> >>>>>>> I think this needs !HONOR_NANS (type)? >>>>>>> >>>>>>> Otherwise OK. >>>>>> Thanks for the feedback, Richard and Andrew. I made the following >>>>>> changes to the patch (current version of the patch below): >>>>>> - also applied the pattern to POWI and added tests for pow, powif, powil >>>>>> - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one >>>>>> additionally under !HONOR_NANS (type) >>>>>> - added tests for powf16 >>>>> >>>>> Note powi is GCC internal, it doesn't set errno and it should be subject >>>>> to different rules - I'd rather have patterns working on powi separate. >>>> How about moving the patterns for POWI into the section >>>> flag_unsafe_math_optimizations && canonicalize_math_p () and not use >>>> (!flag_errno_math)? >>> >>> Sounds good. >>> >>>>> >>>>>> Now, I am encountering two problems: >>>>>> >>>>>> First, the transform is not applied for float16 (even if >>>>>> -fexcess-precision=16). Do you know what the problem could be? >>>>> >>>>> I think you want to use POW_ALL instead of POW. The generated >>>>> cfn-operators.pd shows >>>>> >>>>> (define_operator_list POW >>>>> BUILT_IN_POWF >>>>> BUILT_IN_POW >>>>> BUILT_IN_POWL >>>>> IFN_POW) >>>>> (define_operator_list POW_FN >>>>> BUILT_IN_POWF16 >>>>> BUILT_IN_POWF32 >>>>> BUILT_IN_POWF64 >>>>> BUILT_IN_POWF128 >>>>> BUILT_IN_POWF32X >>>>> BUILT_IN_POWF64X >>>>> BUILT_IN_POWF128X >>>>> null) >>>>> (define_operator_list POW_ALL >>>>> BUILT_IN_POWF >>>>> BUILT_IN_POW >>>>> BUILT_IN_POWL >>>>> BUILT_IN_POWF16 >>>>> ... >>>>> >>>>> note this comes at expense of more generated code (in >>>>> gimple/generic-match.pd). >>>> Thanks, that solved the Float16 issue. >>>>> >>>>>> Second, validation on aarch64 shows a regression in tests >>>>>> - gcc.dg/recip_sqrt_mult_1.c and >>>>>> - gcc.dg/recip_sqrt_mult_5.c, >>>>>> because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the >>>>>> recip pass and prevents application of the recip-patterns. The reason >>>>>> for this might be that the single-use restriction only work if the >>>>>> integer argument is non-constant, but in the failing test cases, the >>>>>> integer argument is 2 and the pattern is applied despite the :s flag. >>>>>> For example, my pattern is **not** applied (single-use restriction >>>>>> works) for: >>>>>> double res, res2; >>>>>> void foo (double a, int b) >>>>>> { >>>>>> double f (double); >>>>>> double t1 = 1.0 / a; >>>>>> res = __builtin_powi (t1, b); >>>>>> res2 = f (t1); >>>>>> } >>>>>> >>>>>> But the pattern **is** applied and single-use restriction does **not** >>>>>> work for: >>>>>> double res, res2; >>>>>> void foo (double a) >>>>>> { >>>>>> double f (double); >>>>>> double t1 = 1.0 / a; >>>>>> res = __builtin_powi (t1, 2); >>>>>> res2 = f (t1); >>>>>> } >>>>> >>>>> This must be because the result is a single operation. :s only applies >>>>> when the result has sub-expresions. This is to make CSE work. >>>>> The "fix" is to add explicit && single_use (@n) to override that >>>>> behavior. Note that I think the transform is good even when the >>>>> division is used because the result reduces the dependence chain length. >>>>> It's only when @2 is non-constant that we're introducing another >>>>> stmt for the negation that re-introduces this latency (even if in >>>>> practice it would be smaller). >>>>> >>>>>> Possible options to resolve this are: >>>>>> - gate pattern to run after recip pass >>>>>> - do not apply pattern for POWI >>>>> >>>>> - adjust the testcase (is the final outcome still good?) >>>> Without the patch, there is one fdiv instruction less (below is the >>>> assembly for recip_sqrt_mult_1.c, but for _5.c it’s analogous): >>>> No patch or with single_use of the result of the division: >>>> foo: >>>> fmov d30, 1.0e+0 >>>> fsqrt d31, d0 >>>> adrp x0, .LANCHOR0 >>>> add x1, x0, :lo12:.LANCHOR0 >>>> fdiv d30, d30, d0 >>>> fmul d0, d31, d30 >>>> str d0, [x0, #:lo12:.LANCHOR0] >>>> stp d30, d31, [x1, 8] >>>> ret >>>> >>>> With patch: >>>> foo: >>>> fsqrt d31, d0 >>>> fmov d30, 1.0e+0 >>>> adrp x1, .LANCHOR0 >>>> add x0, x1, :lo12:.LANCHOR0 >>>> fdiv d31, d30, d31 >>>> fdiv d30, d30, d0 >>>> str d31, [x1, #:lo12:.LANCHOR0] >>>> fmul d31, d31, d0 >>>> stp d30, d31, [x0, 8] >>>> ret >>>> So, we might want to use the single_use guard. >>> >>> Yeah, this is because the powi inline expansion will add back >>> the division. >> Below is the updated patch, I re-validated with no regression on aarch64 and >> x86_64. >> Thanks, >> Jenni >> >> This patch adds the following two simplifications in match.pd for >> POW_ALL and POWI: >> - pow (1.0/x, y) to pow (x, -y), avoiding the division >> - pow (0.0, x) to 0.0, avoiding the call to pow. >> The patterns are guarded by flag_unsafe_math_optimizations, >> !flag_trapping_math, and !HONOR_INFINITIES. >> The POW_ALL patterns are also gated under !flag_errno_math. >> The second pattern is also guarded by !HONOR_NANS and >> !HONOR_SIGNED_ZEROS. >> >> Tests were added to confirm the application of the transform for >> builtins pow, powf, powl, powi, powif, powil, and powf16. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu and >> x86_64-linux-gnu, no regression. >> OK for mainline? > > OK. Thanks, pushed to trunk: 07a8538d90763f0ae640dea822bdeb63ea17ec44 Jennifer > > Thanks, > Richard. > >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and >> pow (0.0, x) -> 0.0. >> >> gcc/testsuite/ >> * gcc.dg/tree-ssa/pow_fold_1.c: New test. >> --- >> gcc/match.pd | 28 +++++++++++++++ >> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++ >> 2 files changed, 70 insertions(+) >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> >> diff --git a/gcc/match.pd b/gcc/match.pd >> index 12d81fcac0d..6d9868d2bb1 100644 >> --- a/gcc/match.pd >> +++ b/gcc/match.pd >> @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> (rdiv @0 (exps:s @1)) >> (mult @0 (exps (negate @1))))) >> >> + (for pow (POW_ALL) >> + (if (! HONOR_INFINITIES (type) >> + && ! flag_trapping_math >> + && ! flag_errno_math) >> + /* Simplify pow(1.0/x, y) into pow(x, -y). */ >> + (simplify >> + (pow (rdiv:s real_onep@0 @1) @2) >> + (pow @1 (negate @2))) >> + >> + /* Simplify pow(0.0, x) into 0.0. */ >> + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type)) >> + (simplify >> + (pow real_zerop@0 @1) >> + @0)))) >> + >> (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) >> && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) >> && ! flag_trapping_math >> @@ -8561,6 +8576,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >> (mult (POW:s @0 @1) (POW:s @2 @1)) >> (POW (mult @0 @2) @1)) >> >> + (if (! HONOR_INFINITIES (type) && ! flag_trapping_math) >> + /* Simplify powi(1.0/x, y) into powi(x, -y). */ >> + (simplify >> + (POWI (rdiv@3 real_onep@0 @1) @2) >> + (if (single_use (@3)) >> + (POWI @1 (negate @2)))) >> + >> + /* Simplify powi(0.0, x) into 0.0. */ >> + (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type)) >> + (simplify >> + (POWI real_zerop@0 @1) >> + @0))) >> + >> /* Simplify powi(x,y) * powi(z,y) -> powi(x*z,y). */ >> (simplify >> (mult (POWI:s @0 @1) (POWI:s @2 @1)) >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> new file mode 100644 >> index 00000000000..d98bcb0827e >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c >> @@ -0,0 +1,42 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-Ofast -fdump-tree-optimized -fexcess-precision=16" } */ >> +/* { dg-add-options float16 } */ >> +/* { dg-require-effective-target float16_runtime } */ >> +/* { dg-require-effective-target c99_runtime } */ >> + >> +extern void link_error (void); >> + >> +#define POW1OVER(TYPE1, TYPE2, CTY, TY) \ >> + void \ >> + pow1over_##TY (TYPE1 x, TYPE2 y) \ >> + { \ >> + TYPE1 t1 = 1.0##CTY / x; \ >> + TYPE1 t2 = __builtin_pow##TY (t1, y); \ >> + TYPE2 t3 = -y; \ >> + TYPE1 t4 = __builtin_pow##TY (x, t3); \ >> + if (t2 != t4) \ >> + link_error (); \ >> + } \ >> + >> +#define POW0(TYPE1, TYPE2, CTY, TY) \ >> + void \ >> + pow0_##TY (TYPE2 x) \ >> + { \ >> + TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x); \ >> + if (t1 != 0.0##CTY) \ >> + link_error (); \ >> + } \ >> + >> +#define TEST_ALL(TYPE1, TYPE2, CTY, TY) \ >> + POW1OVER (TYPE1, TYPE2, CTY, TY) \ >> + POW0 (TYPE1, TYPE2, CTY, TY) >> + >> +TEST_ALL (double, double, , ) >> +TEST_ALL (float, float, f, f) >> +TEST_ALL (_Float16, _Float16, f16, f16) >> +TEST_ALL (long double, long double, L, l) >> +TEST_ALL (double, int, , i) >> +TEST_ALL (float, int, f, if) >> +TEST_ALL (long double, int, L, il) >> + >> +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */ >> > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
smime.p7s
Description: S/MIME cryptographic signature