Re: [PATCH] match.pd: Add std::pow folding optimizations.

Jennifer Schmitz Fri, 25 Oct 2024 06:58:48 -0700

> On 25 Oct 2024, at 14:39, Richard Biener <rguent...@suse.de> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, 23 Oct 2024, Jennifer Schmitz wrote:
> 
>> 
>> 
>>> On 22 Oct 2024, at 13:14, Richard Biener <rguent...@suse.de> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Tue, 22 Oct 2024, Jennifer Schmitz wrote:
>>> 
>>>> 
>>>> 
>>>>> On 22 Oct 2024, at 11:05, Richard Biener <rguent...@suse.de> wrote:
>>>>> 
>>>>> External email: Use caution opening links or attachments
>>>>> 
>>>>> 
>>>>> On Tue, 22 Oct 2024, Jennifer Schmitz wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 21 Oct 2024, at 10:51, Richard Biener <rguent...@suse.de> wrote:
>>>>>>> 
>>>>>>> External email: Use caution opening links or attachments
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, 18 Oct 2024, Jennifer Schmitz wrote:
>>>>>>> 
>>>>>>>> This patch adds the following two simplifications in match.pd:
>>>>>>>> - pow (1.0/x, y) to pow (x, -y), avoiding the division
>>>>>>>> - pow (0.0, x) to 0.0, avoiding the call to pow.
>>>>>>>> The patterns are guarded by flag_unsafe_math_optimizations,
>>>>>>>> !flag_trapping_math, !flag_errno_math, !HONOR_SIGNED_ZEROS,
>>>>>>>> and !HONOR_INFINITIES.
>>>>>>>> 
>>>>>>>> Tests were added to confirm the application of the transform for float,
>>>>>>>> double, and long double.
>>>>>>>> 
>>>>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>>>>>>>> x86_64-linux-gnu, no regression.
>>>>>>>> OK for mainline?
>>>>>>>> 
>>>>>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>>>>>>> 
>>>>>>>> gcc/
>>>>>>>>   * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
>>>>>>>>   pow (0.0, x) -> 0.0.
>>>>>>>> 
>>>>>>>> gcc/testsuite/
>>>>>>>>   * gcc.dg/tree-ssa/pow_fold_1.c: New test.
>>>>>>>> ---
>>>>>>>> gcc/match.pd                               | 14 +++++++++
>>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 34 ++++++++++++++++++++++
>>>>>>>> 2 files changed, 48 insertions(+)
>>>>>>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>>>>>>>> 
>>>>>>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>>>>>>> index 12d81fcac0d..ba100b117e7 100644
>>>>>>>> --- a/gcc/match.pd
>>>>>>>> +++ b/gcc/match.pd
>>>>>>>> @@ -8203,6 +8203,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>>>>>>> (rdiv @0 (exps:s @1))
>>>>>>>>  (mult @0 (exps (negate @1)))))
>>>>>>>> 
>>>>>>>> + /* Simplify pow(1.0/x, y) into pow(x, -y).  */
>>>>>>>> + (if (! HONOR_INFINITIES (type)
>>>>>>>> +      && ! HONOR_SIGNED_ZEROS (type)
>>>>>>>> +      && ! flag_trapping_math
>>>>>>>> +      && ! flag_errno_math)
>>>>>>>> +  (simplify
>>>>>>>> +   (POW (rdiv:s real_onep@0 @1) @2)
>>>>>>>> +    (POW @1 (negate @2)))
>>>>>>> 
>>>>>>> This one shouldn't need HONOR_SIGNED_ZEROS?
>>>>>>> 
>>>>>>>> +
>>>>>>>> +  /* Simplify pow(0.0, x) into 0.0.  */
>>>>>>>> +  (simplify
>>>>>>>> +   (POW real_zerop@0 @1)
>>>>>>> 
>>>>>>> I think this needs !HONOR_NANS (type)?
>>>>>>> 
>>>>>>> Otherwise OK.
>>>>>> Thanks for the feedback, Richard and Andrew. I made the following 
>>>>>> changes to the patch (current version of the patch below):
>>>>>> - also applied the pattern to POWI and added tests for pow, powif, powil
>>>>>> - not gate first pattern under !HONOR_SIGNED_ZEROS, but second one 
>>>>>> additionally under !HONOR_NANS (type)
>>>>>> - added tests for powf16
>>>>> 
>>>>> Note powi is GCC internal, it doesn't set errno and it should be subject
>>>>> to different rules - I'd rather have patterns working on powi separate.
>>>> How about moving the patterns for POWI into the section 
>>>> flag_unsafe_math_optimizations && canonicalize_math_p () and not use 
>>>> (!flag_errno_math)?
>>> 
>>> Sounds good.
>>> 
>>>>> 
>>>>>> Now, I am encountering two problems:
>>>>>> 
>>>>>> First, the transform is not applied for float16 (even if 
>>>>>> -fexcess-precision=16). Do you know what the problem could be?
>>>>> 
>>>>> I think you want to use POW_ALL instead of POW.  The generated
>>>>> cfn-operators.pd shows
>>>>> 
>>>>> (define_operator_list POW
>>>>>  BUILT_IN_POWF
>>>>>  BUILT_IN_POW
>>>>>  BUILT_IN_POWL
>>>>>  IFN_POW)
>>>>> (define_operator_list POW_FN
>>>>>  BUILT_IN_POWF16
>>>>>  BUILT_IN_POWF32
>>>>>  BUILT_IN_POWF64
>>>>>  BUILT_IN_POWF128
>>>>>  BUILT_IN_POWF32X
>>>>>  BUILT_IN_POWF64X
>>>>>  BUILT_IN_POWF128X
>>>>>  null)
>>>>> (define_operator_list POW_ALL
>>>>>  BUILT_IN_POWF
>>>>>  BUILT_IN_POW
>>>>>  BUILT_IN_POWL
>>>>>  BUILT_IN_POWF16
>>>>> ...
>>>>> 
>>>>> note this comes at expense of more generated code (in
>>>>> gimple/generic-match.pd).
>>>> Thanks, that solved the Float16 issue.
>>>>> 
>>>>>> Second, validation on aarch64 shows a regression in tests
>>>>>> - gcc.dg/recip_sqrt_mult_1.c and
>>>>>> - gcc.dg/recip_sqrt_mult_5.c,
>>>>>> because the pattern (POWI(1/x, y) -> POWI(x, -y)) is applied before the 
>>>>>> recip pass and prevents application of the recip-patterns. The reason 
>>>>>> for this might be that the single-use restriction only work if the 
>>>>>> integer argument is non-constant, but in the failing test cases, the 
>>>>>> integer argument is 2 and the pattern is applied despite the :s flag.
>>>>>> For example, my pattern is **not** applied (single-use restriction 
>>>>>> works) for:
>>>>>> double res, res2;
>>>>>> void foo (double a, int b)
>>>>>> {
>>>>>> double f (double);
>>>>>> double t1 = 1.0 / a;
>>>>>> res = __builtin_powi (t1, b);
>>>>>> res2 = f (t1);
>>>>>> }
>>>>>> 
>>>>>> But the pattern **is** applied and single-use restriction does **not** 
>>>>>> work for:
>>>>>> double res, res2;
>>>>>> void foo (double a)
>>>>>> {
>>>>>> double f (double);
>>>>>> double t1 = 1.0 / a;
>>>>>> res = __builtin_powi (t1, 2);
>>>>>> res2 = f (t1);
>>>>>> }
>>>>> 
>>>>> This must be because the result is a single operation.  :s only applies
>>>>> when the result has sub-expresions.  This is to make CSE work.
>>>>> The "fix" is to add explicit && single_use (@n) to override that
>>>>> behavior.  Note that I think the transform is good even when the
>>>>> division is used because the result reduces the dependence chain length.
>>>>> It's only when @2 is non-constant that we're introducing another
>>>>> stmt for the negation that re-introduces this latency (even if in
>>>>> practice it would be smaller).
>>>>> 
>>>>>> Possible options to resolve this are:
>>>>>> - gate pattern to run after recip pass
>>>>>> - do not apply pattern for POWI
>>>>> 
>>>>> - adjust the testcase (is the final outcome still good?)
>>>> Without the patch, there is one fdiv instruction less (below is the 
>>>> assembly for recip_sqrt_mult_1.c, but for _5.c it’s analogous):
>>>> No patch or with single_use of the result of the division:
>>>> foo:
>>>>     fmov d30, 1.0e+0
>>>>     fsqrt d31, d0
>>>>     adrp x0, .LANCHOR0
>>>>     add x1, x0, :lo12:.LANCHOR0
>>>>     fdiv d30, d30, d0
>>>>     fmul d0, d31, d30
>>>>     str d0, [x0, #:lo12:.LANCHOR0]
>>>>     stp d30, d31, [x1, 8]
>>>>     ret
>>>> 
>>>> With patch:
>>>> foo:
>>>>       fsqrt   d31, d0
>>>>       fmov    d30, 1.0e+0
>>>>       adrp    x1, .LANCHOR0
>>>>       add     x0, x1, :lo12:.LANCHOR0
>>>>       fdiv    d31, d30, d31
>>>>       fdiv    d30, d30, d0
>>>>       str     d31, [x1, #:lo12:.LANCHOR0]
>>>>       fmul    d31, d31, d0
>>>>       stp     d30, d31, [x0, 8]
>>>>       ret
>>>> So, we might want to use the single_use guard.
>>> 
>>> Yeah, this is because the powi inline expansion will add back
>>> the division.
>> Below is the updated patch, I re-validated with no regression on aarch64 and 
>> x86_64.
>> Thanks,
>> Jenni
>> 
>> This patch adds the following two simplifications in match.pd for
>> POW_ALL and POWI:
>> - pow (1.0/x, y) to pow (x, -y), avoiding the division
>> - pow (0.0, x) to 0.0, avoiding the call to pow.
>> The patterns are guarded by flag_unsafe_math_optimizations,
>> !flag_trapping_math, and !HONOR_INFINITIES.
>> The POW_ALL patterns are also gated under !flag_errno_math.
>> The second pattern is also guarded by !HONOR_NANS and
>> !HONOR_SIGNED_ZEROS.
>> 
>> Tests were added to confirm the application of the transform for
>> builtins pow, powf, powl, powi, powif, powil, and powf16.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>> x86_64-linux-gnu, no regression.
>> OK for mainline?
> 
> OK.
Thanks, pushed to trunk: 07a8538d90763f0ae640dea822bdeb63ea17ec44
Jennifer
> 
> Thanks,
> Richard.
> 
>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>> 
>> gcc/
>>      * match.pd: Fold pow (1.0/x, y) -> pow (x, -y) and
>>      pow (0.0, x) -> 0.0.
>> 
>> gcc/testsuite/
>>      * gcc.dg/tree-ssa/pow_fold_1.c: New test.
>> ---
>> gcc/match.pd                               | 28 +++++++++++++++
>> gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c | 42 ++++++++++++++++++++++
>> 2 files changed, 70 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 12d81fcac0d..6d9868d2bb1 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -8203,6 +8203,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>    (rdiv @0 (exps:s @1))
>>     (mult @0 (exps (negate @1)))))
>> 
>> + (for pow (POW_ALL)
>> +  (if (! HONOR_INFINITIES (type)
>> +       && ! flag_trapping_math
>> +       && ! flag_errno_math)
>> +   /* Simplify pow(1.0/x, y) into pow(x, -y).  */
>> +   (simplify
>> +    (pow (rdiv:s real_onep@0 @1) @2)
>> +     (pow @1 (negate @2)))
>> +
>> +   /* Simplify pow(0.0, x) into 0.0.  */
>> +   (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type))
>> +    (simplify
>> +     (pow real_zerop@0 @1)
>> +      @0))))
>> +
>>  (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type)
>>       && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)
>>       && ! flag_trapping_math
>> @@ -8561,6 +8576,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>   (mult (POW:s @0 @1) (POW:s @2 @1))
>>    (POW (mult @0 @2) @1))
>> 
>> + (if (! HONOR_INFINITIES (type) && ! flag_trapping_math)
>> +  /* Simplify powi(1.0/x, y) into powi(x, -y).  */
>> +  (simplify
>> +   (POWI (rdiv@3 real_onep@0 @1) @2)
>> +   (if (single_use (@3))
>> +    (POWI @1 (negate @2))))
>> +
>> +  /* Simplify powi(0.0, x) into 0.0.  */
>> +  (if (! HONOR_NANS (type) && ! HONOR_SIGNED_ZEROS (type))
>> +   (simplify
>> +    (POWI real_zerop@0 @1)
>> +     @0)))
>> +
>>  /* Simplify powi(x,y) * powi(z,y) -> powi(x*z,y). */
>>  (simplify
>>   (mult (POWI:s @0 @1) (POWI:s @2 @1))
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> new file mode 100644
>> index 00000000000..d98bcb0827e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pow_fold_1.c
>> @@ -0,0 +1,42 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-Ofast -fdump-tree-optimized -fexcess-precision=16" } */
>> +/* { dg-add-options float16 } */
>> +/* { dg-require-effective-target float16_runtime } */
>> +/* { dg-require-effective-target c99_runtime } */
>> +
>> +extern void link_error (void);
>> +
>> +#define POW1OVER(TYPE1, TYPE2, CTY, TY)                      \
>> +  void                                                       \
>> +  pow1over_##TY (TYPE1 x, TYPE2 y)                   \
>> +  {                                                  \
>> +    TYPE1 t1 = 1.0##CTY / x;                         \
>> +    TYPE1 t2 = __builtin_pow##TY (t1, y);            \
>> +    TYPE2 t3 = -y;                                   \
>> +    TYPE1 t4 = __builtin_pow##TY (x, t3);            \
>> +    if (t2 != t4)                                    \
>> +      link_error ();                                 \
>> +  }                                                  \
>> +
>> +#define POW0(TYPE1, TYPE2, CTY, TY)                  \
>> +  void                                                       \
>> +  pow0_##TY (TYPE2 x)                                        \
>> +  {                                                  \
>> +    TYPE1 t1 = __builtin_pow##TY (0.0##CTY, x);              \
>> +    if (t1 != 0.0##CTY)                                      \
>> +      link_error ();                                 \
>> +  }                                                  \
>> +
>> +#define TEST_ALL(TYPE1, TYPE2, CTY, TY)                      \
>> +  POW1OVER (TYPE1, TYPE2, CTY, TY)                   \
>> +  POW0 (TYPE1, TYPE2, CTY, TY)
>> +
>> +TEST_ALL (double, double, , )
>> +TEST_ALL (float, float, f, f)
>> +TEST_ALL (_Float16, _Float16, f16, f16)
>> +TEST_ALL (long double, long double, L, l)
>> +TEST_ALL (double, int, , i)
>> +TEST_ALL (float, int, f, if)
>> +TEST_ALL (long double, int, L, il)
>> +
>> +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */
>> 
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
smime.p7s
Description: S/MIME cryptographic signature
Re: [PATCH] match.pd: Add std::pow folding optimizations.

Reply via email to