Re: Start implementing -frounding-math

Richard Biener Mon, 24 Jun 2019 07:09:44 -0700

On Mon, Jun 24, 2019 at 3:47 PM Marc Glisse <marc.gli...@inria.fr> wrote:
>
> On Mon, 24 Jun 2019, Richard Biener wrote:
>
> > On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.gli...@inria.fr> wrote:
> >>
> >> On Sat, 22 Jun 2019, Richard Biener wrote:
> >>
> >>> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.gli...@inria.fr> 
> >>> wrote:
> >>>> Hello,
> >>>>
> >>>> as discussed in the PR, this seems like a simple enough approach to
> >>>> handle
> >>>> FENV functionality safely, while keeping it possible to implement
> >>>> optimizations in the future.
> >>>>
> >>>> Some key missing things:
> >>>> - handle C, not just C++ (I don't care, but some people probably do)
> >>>
> >>> As you tackle C++, what does the standard say to constexpr contexts and
> >>> FENV? That is, what's the FP environment at compiler - time (I suppose
> >>> FENV modifying functions are not constexpr declared).
> >>
> >> The C++ standard doesn't care much about fenv:
> >>
> >> [Note: This document does not require an implementation to support the
> >> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
> >> is supported. As a consequence, it is implementation- defined whether
> >> these functions can be used to test floating-point status flags, set
> >> floating-point control modes, or run under non-default mode settings. If
> >> the pragma is used to enable control over the floating-point environment,
> >> this document does not specify the effect on floating-point evaluation in
> >> constant expressions. — end note]
> >
> > Oh, I see.
> >
> >> We should care about the C standard, and do whatever makes sense for C++
> >> without expecting the C++ standard to tell us exactly what that is. We can
> >> check what visual studio and intel do, but we don't have to follow them.
> >
> > This makes it somewhat odd to implement this for C++ first and not C, but 
> > hey ;)
>
> Well, I maintain a part of CGAL, a C++ library, that uses interval
> arithmetic and thus relies on a non-default rounding direction. I am
> trying to prepare this dog food so I can eat it myself...


;)

> >> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
> >> on" covering the whole program.
> >>
> >> For constant expressions, I see a difference between
> >> constexpr double third = 1. / 3.;
> >> which really needs to be done at compile time, and
> >> const double third = 1. / 3.;
> >> which will try to evaluate the rhs as constexpr, but where the program is
> >> still valid if that fails. The second one clearly should refuse to be
> >> evaluated at compile time if we are specifying a dynamic rounding
> >> direction. For the first one, I am not sure. I guess you should only write
> >> that in "fenv_access off" regions and I wouldn't mind a compile error.
> >>
> >> Note that C2x adds a pragma fenv_round that specifies a rounding direction
> >> for a region of code, which seems relevant for constant expressions. That
> >> pragma looks hard, but maybe some pieces would be nice to add.
> >
> > Hmm.  My thinking was along the line that at the start of main() the
> > C abstract machine might specify the initial rounding mode (and exception
> > state) is implementation defined and all constant expressions are evaluated
> > whilst being in this state.  So we can define that to round-to-nearest and
> > simply fold all constants in contexts we are allowed to evaluate at
> > compile-time as we see them?
>
> There are way too many such contexts. In C++, any initializer is
> constexpr-evaluated if possible (PR 85746 shows that this is bad for
> __builtin_constant_p), and I do want
> double d = 1. / 3;
> to depend on the dynamic rounding direction. I'd rather err on the other
> extreme and only fold when we are forced to, say
> constexpr double d = 1. / 3;
> or even reject it because it is inexact, if pragmas put us in a region
> with dynamic rounding.

OK, fair enough.  I just hoped that global

double x = 1.0/3.0;

do not become runtime initializers with -frounding-math ...

> > I guess fenv_round aims at using a pragma to change the rounding mode?
>
> Yes. You can specify either a fixed rounding mode, or "dynamic". In the
> first case, it overrides the dynamic rounding mode.
>
> >>>> - handle vectors (for complex, I don't know what it means)
> >>>>
> >>>> Then flag_trapping_math should also enable this path, meaning that we
> >>>> should stop making it the default, or performance will suffer.
> >>>
> >>> Do we need N variants of the functions to really encode FP options into
> >>> the IL and thus allow inlining of say different signed-zero flag
> >>> functions?
> >>
> >> Not sure what you are suggesting. I am essentially creating a new
> >> tree_code (well, an internal function) for an addition-like function that
> >> actually reads/writes memory, so it should be orthogonal to inlining, and
> >> only the front-end should care about -frounding-math. I didn't think about
> >> the interaction with signed-zero. Ah, you mean
> >> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?
> >
> > Yeah.  Basically the goal is to have the IL fully defined on its own, 
> > without
> > having its semantic depend on flag_*.
> >
> >> The ones I am starting
> >> from are supposed to be safe-for-everything. As refinement, I was thinking
> >> in 2 directions:
> >> * add a third constant argument, where we can specify extra info
> >> * add a variant for the case where the function is pure (because I expect
> >> that's easier on the compiler than "pure if (arg3 & 8) != 0")
> >> I am not sure more variants are needed.
> >
> > For optimization having a ADD_ROUND_TO_ZERO (or the extra params
> > specifying an explicit rounding mode) might be interesting since on x86
> > there are now instructions with rounding mode control bits.
>
> Yes. Pragma fenv_round would match well with that. On the other hand, it
> would be painful for platforms that do not have such instructions, forcing
> to generate plenty of fe[gs]etround, and probably have a pass to try and
> reduce their number.
>
> Side remark, I am sad that Intel added rounded versions for scalars and
> 512 bit vectors but not for intermediate sizes, while I am most
> interested in 128 bits. Masking most of the 512 bits still causes the
> dreaded clock slow-down.

Ick.  I thought this was vector-length agnostic...

> >> Also, while rounding clearly applies to an operation, signed-zero kind of
> >> seems to apply to a variable, and in an operation, I don't really know if
> >> it means that I can pretend that an argument of -0. is +0. (I can return
> >> +inf for 1/-0.) or if it means I can return 0. when the operation should
> >> return -0.. Probably both... If we have just -fsigned-zeros but no
> >> rounding or trapping, the penalty of using an IFN would be bad. But indeed
> >> inlining functions with different -f(no-)signed-zeros forces to use
> >> -fsigned-zeros for the whole merged function if we don't encode it in the
> >> operations. Hmm
> >
> > Yeah.  I guess we need to think about each and every case and how
> > to deal with it.  There's denormals and flush-to-zero (not covered by
> > posix fenv modification IIRC) and a lot of math optimization flags
> > that do not map to FP operations directly...
>
> If we really try to model all that, at some point we may as well remove
> PLUS_EXPR for floats...
>
> .FENV_PLUS (x, y, flags)
>
> where flags is a bitfield that specifies if we care about signed zeros,
> signalling NaNs, what the rounding is (dynamic, don't care, up, down,
> etc), if we care about exceptions, if we can do unsafe optimizations, if
> we can contract +* into fma, etc. That would force to rewrite a lot of
> optimizations :-(
>
> And CSE might become complicated with several expressions that differ only
> in their flags.
>
> .FENV_PLUS (x, y) was supposed to be equivalent to .FENV_PLUS (x, y,
> safeflags) where safeflags are the strictest flags possible, while leaving
> existing stuff like -funsafe-math-optimizations alone (so no regression),
> with the idea that the version with flags would come later.

Yeah, I'm fine with this incremental approach and it really be constrained
to FP environment access.

> >>> I didn't look at the patch but I suppose you rely on RTL to not do code
> >>> motion across FENV modifications and not fold Constants?
> >>
> >> No, I rely on asm volatile to prevent that, as in your recent hack, except
> >> that the asm only appears near expansion. I am trying to start from
> >> something safe and refine with optimizations, no subtlety.
> >
> > Ah, OK.  So indeed instead of a new pass doing the lowering on GIMPLE
> > this should ideally be done by populating expand_FENV_* appropriately.
>
> Yes, I was lazy because it means I need to understand better how expansion
> works :-(

A bit of copy&paste from examples could do the trick I guess...

> >>> That is, don't we really need unspec_volatile variant patterns for the
> >>> Operations?
> >>
> >> Yes. One future optimization (that I listed in the PR) is to let targets
> >> expand those IFN as they like (without the asm barriers), using some
> >> unspec_volatile. I hope we can get there, although just letting targets
> >> replace "=g" with whatever in the asm would already get most of the
> >> benefits.
> >>
> >>
> >>
> >> I just thought of one issue for vector intrinsics, say _mm_add_pd, where
> >> the fenv_access status that should matter is that of the caller, not the
> >> one in emmintrin.h. But since I don't have the pragma or vectors, that can
> >> wait.
> >
> > True.  I guess for the intrinsic headers we could invent some new attribute
> > (or assume such semantics for always_inline which IIRC they are) saying
> > that a function inherits options from the caller (difficult if not
> > inlined, it would
> > imply cloning, thus always-inline again...).
> >
> > On the patch I'd name _DIV _RDIV (to match the tree code we are dealing
> > with).  You miss _NEGATE
>
> True. I am only interested in -frounding-math, so my first reaction was
> that I don't need to do anything for NEGATE, but indeed with a signalling
> NaN anything can have an effect.
>
> > and also the _FIX_TRUNC and _FLOAT in case those might trap with
> > -ftrapping-math.
>
> I don't know much about fixed point, and I didn't think about conversions
> yet. I'll have to check what the C standard says about those.

FIX_TRUNC is float -> integer conversion (overflow/underflow flag?)

> > There are also internal functions for POW, FMOD and others which are
> > ECF_CONST but may not end up being folded from their builtin
> > counter-part with -frounding-math.
>
> I don't know how far this needs to go. SQRT has correctly rounded
> instructions on several targets, so it is relevant. But unless your libm
> provides a correctly-rounded implementation of pow, the compiler could
> also ignore it. The new pragma fenv_round is scary in part because it
> seems to imply that all math functions need to have a correctly rounding
> implementation.
>
> > I guess builtins need the same treatment for -ftrapping-math as they
> > do for -frounding-math.  I think you already mentioned the default
> > of this flag doesn't make much sense (well, the flag isn't fully
> > honored/implemented).
>
> PR 54192
> (coincidentally, it caused a missed vectorization in
> https://stackoverflow.com/a/56681744/1918193 last week)

I commented there.  Lets just make -frounding-math == FENV_ACCESS ON
and keep -ftrapping-math as whether FP exceptions raise traps.

> > So I think the patch is a good start but I'd say we should not introduce
> > the new pass but instead expand to the asm() kludge directly which
> > would make it also easier to handle some ops as unspecs in the target.
>
> This also answers what should be done with vectors, I'll need to add code
> to tree-vect-generic for the new functions.

Yeah.  Auto-vectorizing would also need adjustment of course (also
costing like estimate_num_insns or others).

Richard.

> --
> Marc Glisse

Re: Start implementing -frounding-math

Reply via email to