On Wed, Jul 14, 2021 at 10:56 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, Jul 14, 2021 at 4:17 PM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Wed, Jul 14, 2021 at 10:11 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > On Wed, Jul 14, 2021 at 3:49 PM Matthias Kretz <m.kr...@gsi.de> wrote: > > > > > > > > On Wednesday, 14 July 2021 09:39:42 CEST Richard Biener wrote: > > > > > -ffast-math decomposes to quite some flag_* and those generally are > > > > > not > > > > > reflected into the IL but can be different per function (and then > > > > > prevent inlining). > > > > > > > > Is there any chance the "and then prevent inlining" can be eliminated? > > > > Because > > > > then I could write my own fast<float> class in C++, marking all > > > > operators with > > > > __attribute__((optimize("-Ofast")))... > > > > > > > > > There's one "related" IL feature used by the Fortran frontend - > > > > > PAREN_EXPR > > > > > prevents association across it. So for Fortran (when not > > > > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be > > > > > optimized to a. Eventually this could be used to wrap intrinsic > > > > > results > > > > > since most of the issues in the end require association. Note > > > > > PAREN_EXPR > > > > > isn't exposed to the C family frontends but we could of course add a > > > > > builtin-like thing for this _Noassoc ( .... ) or so. Note PAREN_EXPR > > > after a simple grep, I see PAREN_EXPR is expanded to the common RTL > > > pattern. So it doesn't prevent any reassociation at the rtl level? > > > > We don't perform any FP reassociation on RTL (and yes, the above relies on > -ffast-math will imply flag_associative_math, and w/ that we do have > reassociation on RTL > > /* Reassociate floating point addition only when the user > specifies associative math operations. */ > if (FLOAT_MODE_P (mode) > && flag_associative_math) > { > tem = simplify_associative_operation (code, mode, op0, op1); > if (tem) > return tem; > }
Well, then we're lucky that none of the simplify_gen_binary stuff can trigger here or rather we're likely never feeding it large enough RTL to do anything, but yes, I can see that we eventually would optimize 2**52 - 2**52 to zero. But we don't ;) combine does Trying 13 -> 14: 13: r89:DF=r86:DF+r84:DF REG_DEAD r86:DF 14: r89:DF=r89:DF-r84:DF REG_DEAD r84:DF Failed to match this instruction: (set (reg:DF 89) (minus:DF (plus:DF (reg:DF 86) (reg:DF 84)) (reg:DF 84))) which doesn't simplify even at -Ofast. We don't try 6 -> 13 -> 14, likely because of the dual-use 6: r84:DF=[`*.LC0'] REG_EQUAL 4.503599627370496e+15 I think that with a constant it might be simplified. That said, FP reassoc on RTL is quite limited and I doubt anything relies on it at all so we could even remove the remaining pieces. Richard. > > > this). We're also expanding rint() to x + 2**52 - 2**52 (ix86_expand_rint) > > even > > with -ffast-math so we do rely on RTL optimizations not cancelling the +-. > > > > Richard. > > > > > > > > > > survives -Ofast so it's the frontends that would need to choose to > > > > > emit or > > > > > not emit it (or always emit it). > > > > > > > > Interesting. I want that builtin in C++. Currently I use inline asm to > > > > achieve > > > > a similar effect. But the inline asm hammer is really too big for the > > > > problem. > > > > > > > > > > > > -- > > > > ────────────────────────────────────────────────────────────────────────── > > > > Dr. Matthias Kretz > > > > https://mattkretz.github.io > > > > GSI Helmholtz Centre for Heavy Ion Research > > > > https://gsi.de > > > > std::experimental::simd > > > > https://github.com/VcDevel/std-simd > > > > ────────────────────────────────────────────────────────────────────────── > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > -- > BR, > Hongtao