> On 11 Oct 2024, at 13:18, Kyrylo Tkachov <ktkac...@nvidia.com> wrote:
> 
> Hi Jennifer,
> 
>> On 11 Oct 2024, at 10:00, Jennifer Schmitz <jschm...@nvidia.com> wrote:
>> 
>> 
>> 
>>> On 8 Oct 2024, at 10:44, Richard Biener <rguent...@suse.de> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Thu, 3 Oct 2024, Jennifer Schmitz wrote:
>>> 
>>>> 
>>>> 
>>>>> On 1 Oct 2024, at 14:27, Richard Biener <rguent...@suse.de> wrote:
>>>>> 
>>>>> External email: Use caution opening links or attachments
>>>>> 
>>>>> 
>>>>> On Tue, 1 Oct 2024, Jennifer Schmitz wrote:
>>>>> 
>>>>>> This patch implements 4 rules for logarithmic identities in match.pd
>>>>>> under -funsafe-math-optimizations:
>>>>>> 1) logN(1.0/a) -> -logN(a). This avoids the division instruction.
>>>>>> 2) logN(C/a) -> logN(C) - logN(a), where C is a real constant. Same as 
>>>>>> 1).
>>>>>> 3) logN(a) + logN(b) -> logN(a*b). This reduces the number of calls to
>>>>>> log function.
>>>>>> 4) logN(a) - logN(b) -> logN(a/b). Same as 4).
>>>>>> Tests were added for float, double, and long double.
>>>>>> 
>>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>>>>>> x86_64-linux-gnu, no regression.
>>>>>> Additionally, SPEC 2017 fprate was run. While the transform does not seem
>>>>>> to be triggered, we also see no non-noise impact on performance.
>>>>>> OK for mainline?
>>>>> 
>>>>> Since log can set errno we have the builtins affect global memory and
>>>>> thus have VDEFs, this posses issues for match.pd which does not assign
>>>>> new VDEFs upon materializing the result, esp. for the case where
>>>>> you duplicate a call.  There's a similar issue for -frounding-math
>>>>> where intermediate FP status changes can be lost.  match.pd simply
>>>>> follows the SSA use-def chains without regarding memory side-effects.
>>>>> 
>>>>> The transforms are guarded by flag_unsafe_math_optimizations but here
>>>>> I think we need !HONOR_SIGN_DEPENDENT_ROUNDING, !flag_trapping_math
>>>>> (exception state might be different for logN(a) - logN(b) -> logN(a/b),
>>>>> at least WRT INEXACT?), and !flag_errno_math (because of the VDEFs).
>>>>> 
>>>>> +  /* Simplify logN(C/a) into logN(C)-logN(a).  */
>>>>> +  (simplify
>>>>> +   (logs (rdiv:s REAL_CST@0 @1))
>>>>> +    (minus (logs @0) (logs @1)))
>>>>> 
>>>>> I think you want
>>>>> 
>>>>>  (minus (logs! @0) (logs @1))
>>>>> 
>>>>> here to make sure we constant-fold.
>>>>> 
>>>>> +  (simplify
>>>>> +   (minus (logs:s @0) (logs:s @1))
>>>>> +    (logs (rdiv @0 @1))))
>>>>> 
>>>>> I think that's somewhat dangerous for @1 == 0 given log for
>>>>> zero arg results in -HUGE_VAL but a FP division by gives a NaN.
>>>>> I'm not exactly sure whether !HONOR_INFINITIES && !HONOR_NANS
>>>>> is good enough here.
>>>>> 
>>>>> Your testcases probably all trigger during GENERIC folding,
>>>>> bypassing the VDEF issue - you might want to try assigning
>>>>> the comparison operands to tempoaries to run into the actual
>>>>> issues.
>>>> Dear Richard,
>>>> Thanks for the review and suggesting the additional flags. I added
>>>> - !HONOR_SIGN_DEPENDENT_ROUNDING
>>>> - !flag_trapping_math
>>>> - !flag_errno_math
>>>> - !HONOR_INFINITIES
>>>> - !HONOR_NANS
>>>> as guard before the patterns.
>>>> Can we add anything else to account for HUGE_VAL or will !HONOR_INFINITIES 
>>>> && !HONOR_NANS be enough? Or do you have a suggestion how I can check this?
>>>> I validated again on aarch64 and x86_64.
>>> 
>>> Can you change your patch attachment format to be at least text/plain
>>> and ideally just inline it or use git send-mail?  This way reviewing
>>> is much easier.
>>> 
>>> I'll quote it here for reference:
>>> 
>>> (if (flag_unsafe_math_optimizations)
>>> 
>>> [...]
>>> 
>>> + (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type)
>>> +      && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)
>>> +      && ! flag_trapping_math
>>> +      && ! flag_errno_math)
>>> +  (for logs (LOG LOG2 LOG10)
>>> +   /* Simplify logN(1.0/a) into -logN(a).  */
>>> +   (simplify
>>> +    (logs (rdiv:s real_onep@0 @1))
>>> +     (negate (logs @1)))
>>> +
>>> +   /* Simplify logN(C/a) into logN(C)-logN(a).  */
>>> +   (simplify
>>> +    (logs (rdiv:s REAL_CST@0 @1))
>>> +     (minus (logs! @0) (logs @1)))
>>> +
>>> +   /* Simplify logN(a)+logN(b) into logN(a*b).  */
>>> +   (simplify
>>> +    (plus (logs:s @0) (logs:s @1))
>>> +     (logs (mult @0 @1)))
>>> +
>>> +   /* Simplify logN(a)-logN(b) into logN(a/b).  */
>>> +   (simplify
>>> +    (minus (logs:s @0) (logs:s @1))
>>> +     (logs (rdiv @0 @1)))))
>>> 
>>> I'm OK with the extra guards added but I'm also not a IEEE FP expert
>>> here.  In the previous review I did mention the extra constraints
>>> for specific sub-patterns IIRC.
>>> 
>>> As a general note we might want to implement a HONOR_ERRNO (combined_fn).
>>> Probably a bad name, builtin_can_set_errno (combined_fn) might be better,
>>> for example above I'm not sure whether all of log(), log2() and log10()
>>> can set errno and how we generally should handle errno when
>>> -fno-math-errno isn't given and GCC wasn't configured specifically
>>> to target for example glibc.  For glibc it might be nice if we could
>>> tell it we're not interested in errno from math functions and in this
>>> way convey -fno-math-errno to it for example via a crtfastmath like
>>> mechanism (or by calling alternate entry points).
>>> 
>>> So, the patch is OK in case Joseph doesn't have any comments during
>>> this week.
>> Thanks, then I will commit it at the end of the day, if Joseph does not have 
>> further comments.
> 
> Also please add the bugzilla PR reference in the ChangeLog entry when 
> committing.
> Thanks,
> Kyrill
Committed with 4be7d2d340a013d01a47c43d2feb6826d1b67af0.
Thanks,
Jennifer
> 
> 
>> Best,
>> Jennifer
>>> 
>>> Thanks,
>>> Richard.
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to