On Wed, Jul 14, 2021 at 10:56 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Wed, Jul 14, 2021 at 4:17 PM Richard Biener
> <richard.guent...@gmail.com> wrote:
> >
> > On Wed, Jul 14, 2021 at 10:11 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > On Wed, Jul 14, 2021 at 3:49 PM Matthias Kretz <m.kr...@gsi.de> wrote:
> > > >
> > > > On Wednesday, 14 July 2021 09:39:42 CEST Richard Biener wrote:
> > > > > -ffast-math decomposes to quite some flag_* and those generally are 
> > > > > not
> > > > > reflected into the IL but can be different per function (and then
> > > > > prevent inlining).
> > > >
> > > > Is there any chance the "and then prevent inlining" can be eliminated? 
> > > > Because
> > > > then I could write my own fast<float> class in C++, marking all 
> > > > operators with
> > > > __attribute__((optimize("-Ofast")))...
> > > >
> > > > > There's one "related" IL feature used by the Fortran frontend - 
> > > > > PAREN_EXPR
> > > > > prevents association across it.  So for Fortran (when not
> > > > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be
> > > > > optimized to a.  Eventually this could be used to wrap intrinsic 
> > > > > results
> > > > > since most of the issues in the end require association.  Note 
> > > > > PAREN_EXPR
> > > > > isn't exposed to the C family frontends but we could of course add a
> > > > > builtin-like thing for this _Noassoc ( .... ) or so.  Note PAREN_EXPR
> > > after a simple grep, I see PAREN_EXPR is expanded to the common RTL
> > > pattern. So it doesn't prevent any reassociation at the rtl level?
> >
> > We don't perform any FP reassociation on RTL (and yes, the above relies on
> -ffast-math will imply flag_associative_math, and w/ that we do have
> reassociation on RTL
>
>       /* Reassociate floating point addition only when the user
> specifies associative math operations.  */
>       if (FLOAT_MODE_P (mode)
>   && flag_associative_math)
> {
>   tem = simplify_associative_operation (code, mode, op0, op1);
>   if (tem)
>     return tem;
> }

Well, then we're lucky that none of the simplify_gen_binary stuff can
trigger here
or rather we're likely never feeding it large enough RTL to do anything, but
yes, I can see that we eventually would optimize 2**52 - 2**52 to zero.  But
we don't ;)  combine does

Trying 13 -> 14:
   13: r89:DF=r86:DF+r84:DF
      REG_DEAD r86:DF
   14: r89:DF=r89:DF-r84:DF
      REG_DEAD r84:DF
Failed to match this instruction:
(set (reg:DF 89)
    (minus:DF (plus:DF (reg:DF 86)
            (reg:DF 84))
        (reg:DF 84)))

which doesn't simplify even at -Ofast.  We don't try 6 -> 13 -> 14,
likely because
of the dual-use

    6: r84:DF=[`*.LC0']
      REG_EQUAL 4.503599627370496e+15

I think that with a constant it might be simplified.  That said, FP
reassoc on RTL
is quite limited and I doubt anything relies on it at all so we could
even remove
the remaining pieces.

Richard.

>
> > this).  We're also expanding rint() to x + 2**52 - 2**52 (ix86_expand_rint) 
> > even
> > with -ffast-math so we do rely on RTL optimizations not cancelling the +-.
> >
> > Richard.
> >
> > >
> > > > > survives -Ofast so it's the frontends that would need to choose to 
> > > > > emit or
> > > > > not emit it (or always emit it).
> > > >
> > > > Interesting. I want that builtin in C++. Currently I use inline asm to 
> > > > achieve
> > > > a similar effect. But the inline asm hammer is really too big for the 
> > > > problem.
> > > >
> > > >
> > > > --
> > > > ──────────────────────────────────────────────────────────────────────────
> > > >  Dr. Matthias Kretz                           
> > > > https://mattkretz.github.io
> > > >  GSI Helmholtz Centre for Heavy Ion Research               
> > > > https://gsi.de
> > > >  std::experimental::simd              
> > > > https://github.com/VcDevel/std-simd
> > > > ──────────────────────────────────────────────────────────────────────────
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao

Reply via email to