> On 11 Oct 2024, at 13:18, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: > > Hi Jennifer, > >> On 11 Oct 2024, at 10:00, Jennifer Schmitz <jschm...@nvidia.com> wrote: >> >> >> >>> On 8 Oct 2024, at 10:44, Richard Biener <rguent...@suse.de> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> On Thu, 3 Oct 2024, Jennifer Schmitz wrote: >>> >>>> >>>> >>>>> On 1 Oct 2024, at 14:27, Richard Biener <rguent...@suse.de> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On Tue, 1 Oct 2024, Jennifer Schmitz wrote: >>>>> >>>>>> This patch implements 4 rules for logarithmic identities in match.pd >>>>>> under -funsafe-math-optimizations: >>>>>> 1) logN(1.0/a) -> -logN(a). This avoids the division instruction. >>>>>> 2) logN(C/a) -> logN(C) - logN(a), where C is a real constant. Same as >>>>>> 1). >>>>>> 3) logN(a) + logN(b) -> logN(a*b). This reduces the number of calls to >>>>>> log function. >>>>>> 4) logN(a) - logN(b) -> logN(a/b). Same as 4). >>>>>> Tests were added for float, double, and long double. >>>>>> >>>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu and >>>>>> x86_64-linux-gnu, no regression. >>>>>> Additionally, SPEC 2017 fprate was run. While the transform does not seem >>>>>> to be triggered, we also see no non-noise impact on performance. >>>>>> OK for mainline? >>>>> >>>>> Since log can set errno we have the builtins affect global memory and >>>>> thus have VDEFs, this posses issues for match.pd which does not assign >>>>> new VDEFs upon materializing the result, esp. for the case where >>>>> you duplicate a call. There's a similar issue for -frounding-math >>>>> where intermediate FP status changes can be lost. match.pd simply >>>>> follows the SSA use-def chains without regarding memory side-effects. >>>>> >>>>> The transforms are guarded by flag_unsafe_math_optimizations but here >>>>> I think we need !HONOR_SIGN_DEPENDENT_ROUNDING, !flag_trapping_math >>>>> (exception state might be different for logN(a) - logN(b) -> logN(a/b), >>>>> at least WRT INEXACT?), and !flag_errno_math (because of the VDEFs). >>>>> >>>>> + /* Simplify logN(C/a) into logN(C)-logN(a). */ >>>>> + (simplify >>>>> + (logs (rdiv:s REAL_CST@0 @1)) >>>>> + (minus (logs @0) (logs @1))) >>>>> >>>>> I think you want >>>>> >>>>> (minus (logs! @0) (logs @1)) >>>>> >>>>> here to make sure we constant-fold. >>>>> >>>>> + (simplify >>>>> + (minus (logs:s @0) (logs:s @1)) >>>>> + (logs (rdiv @0 @1)))) >>>>> >>>>> I think that's somewhat dangerous for @1 == 0 given log for >>>>> zero arg results in -HUGE_VAL but a FP division by gives a NaN. >>>>> I'm not exactly sure whether !HONOR_INFINITIES && !HONOR_NANS >>>>> is good enough here. >>>>> >>>>> Your testcases probably all trigger during GENERIC folding, >>>>> bypassing the VDEF issue - you might want to try assigning >>>>> the comparison operands to tempoaries to run into the actual >>>>> issues. >>>> Dear Richard, >>>> Thanks for the review and suggesting the additional flags. I added >>>> - !HONOR_SIGN_DEPENDENT_ROUNDING >>>> - !flag_trapping_math >>>> - !flag_errno_math >>>> - !HONOR_INFINITIES >>>> - !HONOR_NANS >>>> as guard before the patterns. >>>> Can we add anything else to account for HUGE_VAL or will !HONOR_INFINITIES >>>> && !HONOR_NANS be enough? Or do you have a suggestion how I can check this? >>>> I validated again on aarch64 and x86_64. >>> >>> Can you change your patch attachment format to be at least text/plain >>> and ideally just inline it or use git send-mail? This way reviewing >>> is much easier. >>> >>> I'll quote it here for reference: >>> >>> (if (flag_unsafe_math_optimizations) >>> >>> [...] >>> >>> + (if (! HONOR_SIGN_DEPENDENT_ROUNDING (type) >>> + && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type) >>> + && ! flag_trapping_math >>> + && ! flag_errno_math) >>> + (for logs (LOG LOG2 LOG10) >>> + /* Simplify logN(1.0/a) into -logN(a). */ >>> + (simplify >>> + (logs (rdiv:s real_onep@0 @1)) >>> + (negate (logs @1))) >>> + >>> + /* Simplify logN(C/a) into logN(C)-logN(a). */ >>> + (simplify >>> + (logs (rdiv:s REAL_CST@0 @1)) >>> + (minus (logs! @0) (logs @1))) >>> + >>> + /* Simplify logN(a)+logN(b) into logN(a*b). */ >>> + (simplify >>> + (plus (logs:s @0) (logs:s @1)) >>> + (logs (mult @0 @1))) >>> + >>> + /* Simplify logN(a)-logN(b) into logN(a/b). */ >>> + (simplify >>> + (minus (logs:s @0) (logs:s @1)) >>> + (logs (rdiv @0 @1))))) >>> >>> I'm OK with the extra guards added but I'm also not a IEEE FP expert >>> here. In the previous review I did mention the extra constraints >>> for specific sub-patterns IIRC. >>> >>> As a general note we might want to implement a HONOR_ERRNO (combined_fn). >>> Probably a bad name, builtin_can_set_errno (combined_fn) might be better, >>> for example above I'm not sure whether all of log(), log2() and log10() >>> can set errno and how we generally should handle errno when >>> -fno-math-errno isn't given and GCC wasn't configured specifically >>> to target for example glibc. For glibc it might be nice if we could >>> tell it we're not interested in errno from math functions and in this >>> way convey -fno-math-errno to it for example via a crtfastmath like >>> mechanism (or by calling alternate entry points). >>> >>> So, the patch is OK in case Joseph doesn't have any comments during >>> this week. >> Thanks, then I will commit it at the end of the day, if Joseph does not have >> further comments. > > Also please add the bugzilla PR reference in the ChangeLog entry when > committing. > Thanks, > Kyrill Committed with 4be7d2d340a013d01a47c43d2feb6826d1b67af0. Thanks, Jennifer > > >> Best, >> Jennifer >>> >>> Thanks, >>> Richard. > >
smime.p7s
Description: S/MIME cryptographic signature