Hi Jennifer, > On 11 Oct 2024, at 10:00, Jennifer Schmitz <jschm...@nvidia.com> wrote: > > > >> On 8 Oct 2024, at 10:44, Richard Biener <rguent...@suse.de> wrote: >> >> External email: Use caution opening links or attachments >> >> >> On Thu, 3 Oct 2024, Jennifer Schmitz wrote: >> >>> >>> >>>> On 1 Oct 2024, at 14:27, Richard Biener <rguent...@suse.de> wrote: >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On Tue, 1 Oct 2024, Jennifer Schmitz wrote: >>>> >>>>> This patch implements 4 rules for logarithmic identities in match.pd >>>>> under -funsafe-math-optimizations: >>>>> 1) logN(1.0/a) -> -logN(a). This avoids the division instruction. >>>>> 2) logN(C/a) -> logN(C) - logN(a), where C is a real constant. Same as 1). >>>>> 3) logN(a) + logN(b) -> logN(a*b). This reduces the number of calls to >>>>> log function. >>>>> 4) logN(a) - logN(b) -> logN(a/b). Same as 4). >>>>> Tests were added for float, double, and long double. >>>>> >>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and >>>>> x86_64-linux-gnu, no regression. >>>>> Additionally, SPEC 2017 fprate was run. While the transform does not seem >>>>> to be triggered, we also see no non-noise impact on performance. >>>>> OK for mainline? >>>> >>>> Since log can set errno we have the builtins affect global memory and >>>> thus have VDEFs, this posses issues for match.pd which does not assign >>>> new VDEFs upon materializing the result, esp. for the case where >>>> you duplicate a call. There's a similar issue for -frounding-math >>>> where intermediate FP status changes can be lost. match.pd simply >>>> follows the SSA use-def chains without regarding memory side-effects. >>>> >>>> The transforms are guarded by flag_unsafe_math_optimizations but here >>>> I think we need !HONOR_SIGN_DEPENDENT_ROUNDING, !flag_trapping_math >>>> (exception state might be different for logN(a) - logN(b) -> logN(a/b), >>>> at least WRT INEXACT?), and !flag_errno_math (because of the VDEFs). >>>> >>>> + /* Simplify logN(C/a) into logN(C)-logN(a). */ >>>> + (simplify >>>> + (logs (rdiv:s REAL_CST@0 @1)) >>>> + (minus (logs @0) (logs @1))) >>>> >>>> I think you want >>>> >>>> (minus (logs! @0) (logs @1)) >>>> >>>> here to make sure we constant-fold. >>>> >>>> + (simplify >>>> + (minus (logs:s @0) (logs:s @1)) >>>> + (logs (rdiv @0 @1)))) >>>> >>>> I think that's somewhat dangerous for @1 == 0 given log for >>>> zero arg results in -HUGE_VAL but a FP division by gives a NaN. >>>> I'm not exactly sure whether !HONOR_INFINITIES && !HONOR_NANS >>>> is good enough here. >>>> >>>> Your testcases probably all trigger during GENERIC folding, >>>> bypassing the VDEF issue - you might want to try assigning >>>> the comparison operands to tempoaries to run into the actual >>>> issues. >>> Dear Richard, >>> Thanks for the review and suggesting the additional flags. I added >>> - !HONOR_SIGN_DEPENDENT_ROUNDING >>> - !flag_trapping_math >>> - !flag_errno_math >>> - !HONOR_INFINITIES >>> - !HONOR_NANS >>> as guard before the patterns. >>> Can we add anything else to account for HUGE_VAL or will !HONOR_INFINITIES >>> && !HONOR_NANS be enough? Or do you have a suggestion how I can check this? >>> I validated again on aarch64 and x86_64. >> >> Can you change your patch attachment format to be at least text/plain >> and ideally just inline it or use git send-mail? This way reviewing >> is much easier. >> >> I'll quote it here for reference: >> >> (if (flag_unsafe_math_optimizations) >> >> [...] >> >> + (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) >> + && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) >> + && ! flag_trapping_math >> + && ! flag_errno_math) >> + (for logs (LOG LOG2 LOG10) >> + /* Simplify logN(1.0/a) into -logN(a). */ >> + (simplify >> + (logs (rdiv:s real_onep@0 @1)) >> + (negate (logs @1))) >> + >> + /* Simplify logN(C/a) into logN(C)-logN(a). */ >> + (simplify >> + (logs (rdiv:s REAL_CST@0 @1)) >> + (minus (logs! @0) (logs @1))) >> + >> + /* Simplify logN(a)+logN(b) into logN(a*b). */ >> + (simplify >> + (plus (logs:s @0) (logs:s @1)) >> + (logs (mult @0 @1))) >> + >> + /* Simplify logN(a)-logN(b) into logN(a/b). */ >> + (simplify >> + (minus (logs:s @0) (logs:s @1)) >> + (logs (rdiv @0 @1))))) >> >> I'm OK with the extra guards added but I'm also not a IEEE FP expert >> here. In the previous review I did mention the extra constraints >> for specific sub-patterns IIRC. >> >> As a general note we might want to implement a HONOR_ERRNO (combined_fn). >> Probably a bad name, builtin_can_set_errno (combined_fn) might be better, >> for example above I'm not sure whether all of log(), log2() and log10() >> can set errno and how we generally should handle errno when >> -fno-math-errno isn't given and GCC wasn't configured specifically >> to target for example glibc. For glibc it might be nice if we could >> tell it we're not interested in errno from math functions and in this >> way convey -fno-math-errno to it for example via a crtfastmath like >> mechanism (or by calling alternate entry points). >> >> So, the patch is OK in case Joseph doesn't have any comments during >> this week. > Thanks, then I will commit it at the end of the day, if Joseph does not have > further comments.
Also please add the bugzilla PR reference in the ChangeLog entry when committing. Thanks, Kyrill > Best, > Jennifer >> >> Thanks, >> Richard.