Hi Jennifer,

> On 11 Oct 2024, at 10:00, Jennifer Schmitz <jschm...@nvidia.com> wrote:
> 
> 
> 
>> On 8 Oct 2024, at 10:44, Richard Biener <rguent...@suse.de> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> On Thu, 3 Oct 2024, Jennifer Schmitz wrote:
>> 
>>> 
>>> 
>>>> On 1 Oct 2024, at 14:27, Richard Biener <rguent...@suse.de> wrote:
>>>> 
>>>> External email: Use caution opening links or attachments
>>>> 
>>>> 
>>>> On Tue, 1 Oct 2024, Jennifer Schmitz wrote:
>>>> 
>>>>> This patch implements 4 rules for logarithmic identities in match.pd
>>>>> under -funsafe-math-optimizations:
>>>>> 1) logN(1.0/a) -> -logN(a). This avoids the division instruction.
>>>>> 2) logN(C/a) -> logN(C) - logN(a), where C is a real constant. Same as 1).
>>>>> 3) logN(a) + logN(b) -> logN(a*b). This reduces the number of calls to
>>>>> log function.
>>>>> 4) logN(a) - logN(b) -> logN(a/b). Same as 4).
>>>>> Tests were added for float, double, and long double.
>>>>> 
>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>>>>> x86_64-linux-gnu, no regression.
>>>>> Additionally, SPEC 2017 fprate was run. While the transform does not seem
>>>>> to be triggered, we also see no non-noise impact on performance.
>>>>> OK for mainline?
>>>> 
>>>> Since log can set errno we have the builtins affect global memory and
>>>> thus have VDEFs, this posses issues for match.pd which does not assign
>>>> new VDEFs upon materializing the result, esp. for the case where
>>>> you duplicate a call.  There's a similar issue for -frounding-math
>>>> where intermediate FP status changes can be lost.  match.pd simply
>>>> follows the SSA use-def chains without regarding memory side-effects.
>>>> 
>>>> The transforms are guarded by flag_unsafe_math_optimizations but here
>>>> I think we need !HONOR_SIGN_DEPENDENT_ROUNDING, !flag_trapping_math
>>>> (exception state might be different for logN(a) - logN(b) -> logN(a/b),
>>>> at least WRT INEXACT?), and !flag_errno_math (because of the VDEFs).
>>>> 
>>>> +  /* Simplify logN(C/a) into logN(C)-logN(a).  */
>>>> +  (simplify
>>>> +   (logs (rdiv:s REAL_CST@0 @1))
>>>> +    (minus (logs @0) (logs @1)))
>>>> 
>>>> I think you want
>>>> 
>>>>   (minus (logs! @0) (logs @1))
>>>> 
>>>> here to make sure we constant-fold.
>>>> 
>>>> +  (simplify
>>>> +   (minus (logs:s @0) (logs:s @1))
>>>> +    (logs (rdiv @0 @1))))
>>>> 
>>>> I think that's somewhat dangerous for @1 == 0 given log for
>>>> zero arg results in -HUGE_VAL but a FP division by gives a NaN.
>>>> I'm not exactly sure whether !HONOR_INFINITIES && !HONOR_NANS
>>>> is good enough here.
>>>> 
>>>> Your testcases probably all trigger during GENERIC folding,
>>>> bypassing the VDEF issue - you might want to try assigning
>>>> the comparison operands to tempoaries to run into the actual
>>>> issues.
>>> Dear Richard,
>>> Thanks for the review and suggesting the additional flags. I added
>>> - !HONOR_SIGN_DEPENDENT_ROUNDING
>>> - !flag_trapping_math
>>> - !flag_errno_math
>>> - !HONOR_INFINITIES
>>> - !HONOR_NANS
>>> as guard before the patterns.
>>> Can we add anything else to account for HUGE_VAL or will !HONOR_INFINITIES 
>>> && !HONOR_NANS be enough? Or do you have a suggestion how I can check this?
>>> I validated again on aarch64 and x86_64.
>> 
>> Can you change your patch attachment format to be at least text/plain
>> and ideally just inline it or use git send-mail?  This way reviewing
>> is much easier.
>> 
>> I'll quote it here for reference:
>> 
>> (if (flag_unsafe_math_optimizations)
>> 
>> [...]
>> 
>> + (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type)
>> +      && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)
>> +      && ! flag_trapping_math
>> +      && ! flag_errno_math)
>> +  (for logs (LOG LOG2 LOG10)
>> +   /* Simplify logN(1.0/a) into -logN(a).  */
>> +   (simplify
>> +    (logs (rdiv:s real_onep@0 @1))
>> +     (negate (logs @1)))
>> +
>> +   /* Simplify logN(C/a) into logN(C)-logN(a).  */
>> +   (simplify
>> +    (logs (rdiv:s REAL_CST@0 @1))
>> +     (minus (logs! @0) (logs @1)))
>> +
>> +   /* Simplify logN(a)+logN(b) into logN(a*b).  */
>> +   (simplify
>> +    (plus (logs:s @0) (logs:s @1))
>> +     (logs (mult @0 @1)))
>> +
>> +   /* Simplify logN(a)-logN(b) into logN(a/b).  */
>> +   (simplify
>> +    (minus (logs:s @0) (logs:s @1))
>> +     (logs (rdiv @0 @1)))))
>> 
>> I'm OK with the extra guards added but I'm also not a IEEE FP expert
>> here.  In the previous review I did mention the extra constraints
>> for specific sub-patterns IIRC.
>> 
>> As a general note we might want to implement a HONOR_ERRNO (combined_fn).
>> Probably a bad name, builtin_can_set_errno (combined_fn) might be better,
>> for example above I'm not sure whether all of log(), log2() and log10()
>> can set errno and how we generally should handle errno when
>> -fno-math-errno isn't given and GCC wasn't configured specifically
>> to target for example glibc.  For glibc it might be nice if we could
>> tell it we're not interested in errno from math functions and in this
>> way convey -fno-math-errno to it for example via a crtfastmath like
>> mechanism (or by calling alternate entry points).
>> 
>> So, the patch is OK in case Joseph doesn't have any comments during
>> this week.
> Thanks, then I will commit it at the end of the day, if Joseph does not have 
> further comments.

Also please add the bugzilla PR reference in the ChangeLog entry when 
committing.
Thanks,
Kyrill


> Best,
> Jennifer
>> 
>> Thanks,
>> Richard.


Reply via email to