On Mon, Jun 24, 2019 at 3:47 PM Marc Glisse <marc.gli...@inria.fr> wrote: > > On Mon, 24 Jun 2019, Richard Biener wrote: > > > On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.gli...@inria.fr> wrote: > >> > >> On Sat, 22 Jun 2019, Richard Biener wrote: > >> > >>> On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.gli...@inria.fr> > >>> wrote: > >>>> Hello, > >>>> > >>>> as discussed in the PR, this seems like a simple enough approach to > >>>> handle > >>>> FENV functionality safely, while keeping it possible to implement > >>>> optimizations in the future. > >>>> > >>>> Some key missing things: > >>>> - handle C, not just C++ (I don't care, but some people probably do) > >>> > >>> As you tackle C++, what does the standard say to constexpr contexts and > >>> FENV? That is, what's the FP environment at compiler - time (I suppose > >>> FENV modifying functions are not constexpr declared). > >> > >> The C++ standard doesn't care much about fenv: > >> > >> [Note: This document does not require an implementation to support the > >> FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma > >> is supported. As a consequence, it is implementation- defined whether > >> these functions can be used to test floating-point status flags, set > >> floating-point control modes, or run under non-default mode settings. If > >> the pragma is used to enable control over the floating-point environment, > >> this document does not specify the effect on floating-point evaluation in > >> constant expressions. — end note] > > > > Oh, I see. > > > >> We should care about the C standard, and do whatever makes sense for C++ > >> without expecting the C++ standard to tell us exactly what that is. We can > >> check what visual studio and intel do, but we don't have to follow them. > > > > This makes it somewhat odd to implement this for C++ first and not C, but > > hey ;) > > Well, I maintain a part of CGAL, a C++ library, that uses interval > arithmetic and thus relies on a non-default rounding direction. I am > trying to prepare this dog food so I can eat it myself...
;) > >> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access > >> on" covering the whole program. > >> > >> For constant expressions, I see a difference between > >> constexpr double third = 1. / 3.; > >> which really needs to be done at compile time, and > >> const double third = 1. / 3.; > >> which will try to evaluate the rhs as constexpr, but where the program is > >> still valid if that fails. The second one clearly should refuse to be > >> evaluated at compile time if we are specifying a dynamic rounding > >> direction. For the first one, I am not sure. I guess you should only write > >> that in "fenv_access off" regions and I wouldn't mind a compile error. > >> > >> Note that C2x adds a pragma fenv_round that specifies a rounding direction > >> for a region of code, which seems relevant for constant expressions. That > >> pragma looks hard, but maybe some pieces would be nice to add. > > > > Hmm. My thinking was along the line that at the start of main() the > > C abstract machine might specify the initial rounding mode (and exception > > state) is implementation defined and all constant expressions are evaluated > > whilst being in this state. So we can define that to round-to-nearest and > > simply fold all constants in contexts we are allowed to evaluate at > > compile-time as we see them? > > There are way too many such contexts. In C++, any initializer is > constexpr-evaluated if possible (PR 85746 shows that this is bad for > __builtin_constant_p), and I do want > double d = 1. / 3; > to depend on the dynamic rounding direction. I'd rather err on the other > extreme and only fold when we are forced to, say > constexpr double d = 1. / 3; > or even reject it because it is inexact, if pragmas put us in a region > with dynamic rounding. OK, fair enough. I just hoped that global double x = 1.0/3.0; do not become runtime initializers with -frounding-math ... > > I guess fenv_round aims at using a pragma to change the rounding mode? > > Yes. You can specify either a fixed rounding mode, or "dynamic". In the > first case, it overrides the dynamic rounding mode. > > >>>> - handle vectors (for complex, I don't know what it means) > >>>> > >>>> Then flag_trapping_math should also enable this path, meaning that we > >>>> should stop making it the default, or performance will suffer. > >>> > >>> Do we need N variants of the functions to really encode FP options into > >>> the IL and thus allow inlining of say different signed-zero flag > >>> functions? > >> > >> Not sure what you are suggesting. I am essentially creating a new > >> tree_code (well, an internal function) for an addition-like function that > >> actually reads/writes memory, so it should be orthogonal to inlining, and > >> only the front-end should care about -frounding-math. I didn't think about > >> the interaction with signed-zero. Ah, you mean > >> IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc? > > > > Yeah. Basically the goal is to have the IL fully defined on its own, > > without > > having its semantic depend on flag_*. > > > >> The ones I am starting > >> from are supposed to be safe-for-everything. As refinement, I was thinking > >> in 2 directions: > >> * add a third constant argument, where we can specify extra info > >> * add a variant for the case where the function is pure (because I expect > >> that's easier on the compiler than "pure if (arg3 & 8) != 0") > >> I am not sure more variants are needed. > > > > For optimization having a ADD_ROUND_TO_ZERO (or the extra params > > specifying an explicit rounding mode) might be interesting since on x86 > > there are now instructions with rounding mode control bits. > > Yes. Pragma fenv_round would match well with that. On the other hand, it > would be painful for platforms that do not have such instructions, forcing > to generate plenty of fe[gs]etround, and probably have a pass to try and > reduce their number. > > Side remark, I am sad that Intel added rounded versions for scalars and > 512 bit vectors but not for intermediate sizes, while I am most > interested in 128 bits. Masking most of the 512 bits still causes the > dreaded clock slow-down. Ick. I thought this was vector-length agnostic... > >> Also, while rounding clearly applies to an operation, signed-zero kind of > >> seems to apply to a variable, and in an operation, I don't really know if > >> it means that I can pretend that an argument of -0. is +0. (I can return > >> +inf for 1/-0.) or if it means I can return 0. when the operation should > >> return -0.. Probably both... If we have just -fsigned-zeros but no > >> rounding or trapping, the penalty of using an IFN would be bad. But indeed > >> inlining functions with different -f(no-)signed-zeros forces to use > >> -fsigned-zeros for the whole merged function if we don't encode it in the > >> operations. Hmm > > > > Yeah. I guess we need to think about each and every case and how > > to deal with it. There's denormals and flush-to-zero (not covered by > > posix fenv modification IIRC) and a lot of math optimization flags > > that do not map to FP operations directly... > > If we really try to model all that, at some point we may as well remove > PLUS_EXPR for floats... > > .FENV_PLUS (x, y, flags) > > where flags is a bitfield that specifies if we care about signed zeros, > signalling NaNs, what the rounding is (dynamic, don't care, up, down, > etc), if we care about exceptions, if we can do unsafe optimizations, if > we can contract +* into fma, etc. That would force to rewrite a lot of > optimizations :-( > > And CSE might become complicated with several expressions that differ only > in their flags. > > .FENV_PLUS (x, y) was supposed to be equivalent to .FENV_PLUS (x, y, > safeflags) where safeflags are the strictest flags possible, while leaving > existing stuff like -funsafe-math-optimizations alone (so no regression), > with the idea that the version with flags would come later. Yeah, I'm fine with this incremental approach and it really be constrained to FP environment access. > >>> I didn't look at the patch but I suppose you rely on RTL to not do code > >>> motion across FENV modifications and not fold Constants? > >> > >> No, I rely on asm volatile to prevent that, as in your recent hack, except > >> that the asm only appears near expansion. I am trying to start from > >> something safe and refine with optimizations, no subtlety. > > > > Ah, OK. So indeed instead of a new pass doing the lowering on GIMPLE > > this should ideally be done by populating expand_FENV_* appropriately. > > Yes, I was lazy because it means I need to understand better how expansion > works :-( A bit of copy&paste from examples could do the trick I guess... > >>> That is, don't we really need unspec_volatile variant patterns for the > >>> Operations? > >> > >> Yes. One future optimization (that I listed in the PR) is to let targets > >> expand those IFN as they like (without the asm barriers), using some > >> unspec_volatile. I hope we can get there, although just letting targets > >> replace "=g" with whatever in the asm would already get most of the > >> benefits. > >> > >> > >> > >> I just thought of one issue for vector intrinsics, say _mm_add_pd, where > >> the fenv_access status that should matter is that of the caller, not the > >> one in emmintrin.h. But since I don't have the pragma or vectors, that can > >> wait. > > > > True. I guess for the intrinsic headers we could invent some new attribute > > (or assume such semantics for always_inline which IIRC they are) saying > > that a function inherits options from the caller (difficult if not > > inlined, it would > > imply cloning, thus always-inline again...). > > > > On the patch I'd name _DIV _RDIV (to match the tree code we are dealing > > with). You miss _NEGATE > > True. I am only interested in -frounding-math, so my first reaction was > that I don't need to do anything for NEGATE, but indeed with a signalling > NaN anything can have an effect. > > > and also the _FIX_TRUNC and _FLOAT in case those might trap with > > -ftrapping-math. > > I don't know much about fixed point, and I didn't think about conversions > yet. I'll have to check what the C standard says about those. FIX_TRUNC is float -> integer conversion (overflow/underflow flag?) > > There are also internal functions for POW, FMOD and others which are > > ECF_CONST but may not end up being folded from their builtin > > counter-part with -frounding-math. > > I don't know how far this needs to go. SQRT has correctly rounded > instructions on several targets, so it is relevant. But unless your libm > provides a correctly-rounded implementation of pow, the compiler could > also ignore it. The new pragma fenv_round is scary in part because it > seems to imply that all math functions need to have a correctly rounding > implementation. > > > I guess builtins need the same treatment for -ftrapping-math as they > > do for -frounding-math. I think you already mentioned the default > > of this flag doesn't make much sense (well, the flag isn't fully > > honored/implemented). > > PR 54192 > (coincidentally, it caused a missed vectorization in > https://stackoverflow.com/a/56681744/1918193 last week) I commented there. Lets just make -frounding-math == FENV_ACCESS ON and keep -ftrapping-math as whether FP exceptions raise traps. > > So I think the patch is a good start but I'd say we should not introduce > > the new pass but instead expand to the asm() kludge directly which > > would make it also easier to handle some ops as unspecs in the target. > > This also answers what should be done with vectors, I'll need to add code > to tree-vect-generic for the new functions. Yeah. Auto-vectorizing would also need adjustment of course (also costing like estimate_num_insns or others). Richard. > -- > Marc Glisse