On Mon, Jun 24, 2019 at 4:57 PM Marc Glisse <marc.gli...@inria.fr> wrote: > > On Mon, 24 Jun 2019, Richard Biener wrote: > > >>>> -frounding-math is supposed to be equivalent to "#pragma stdc fenv_access > >>>> on" covering the whole program. > >>>> > >>>> For constant expressions, I see a difference between > >>>> constexpr double third = 1. / 3.; > >>>> which really needs to be done at compile time, and > >>>> const double third = 1. / 3.; > >>>> which will try to evaluate the rhs as constexpr, but where the program is > >>>> still valid if that fails. The second one clearly should refuse to be > >>>> evaluated at compile time if we are specifying a dynamic rounding > >>>> direction. For the first one, I am not sure. I guess you should only > >>>> write > >>>> that in "fenv_access off" regions and I wouldn't mind a compile error. > >>>> > >>>> Note that C2x adds a pragma fenv_round that specifies a rounding > >>>> direction > >>>> for a region of code, which seems relevant for constant expressions. That > >>>> pragma looks hard, but maybe some pieces would be nice to add. > >>> > >>> Hmm. My thinking was along the line that at the start of main() the > >>> C abstract machine might specify the initial rounding mode (and exception > >>> state) is implementation defined and all constant expressions are > >>> evaluated > >>> whilst being in this state. So we can define that to round-to-nearest and > >>> simply fold all constants in contexts we are allowed to evaluate at > >>> compile-time as we see them? > >> > >> There are way too many such contexts. In C++, any initializer is > >> constexpr-evaluated if possible (PR 85746 shows that this is bad for > >> __builtin_constant_p), and I do want > >> double d = 1. / 3; > >> to depend on the dynamic rounding direction. I'd rather err on the other > >> extreme and only fold when we are forced to, say > >> constexpr double d = 1. / 3; > >> or even reject it because it is inexact, if pragmas put us in a region > >> with dynamic rounding. > > > > OK, fair enough. I just hoped that global > > > > double x = 1.0/3.0; > > > > do not become runtime initializers with -frounding-math ... > > Ah, I wasn't thinking of globals. Ignoring the new pragma fenv_round, > which I guess could affect this (the C draft isn't very explicit), the > program doesn't have many chances to set a rounding mode before > initializing globals. It could do so in the initializer of another > variable, but relying on the order of initialization this way seems bad. > Maybe in this case it would make sense to assume the default rounding > mode... > > In practice, I would only set -frounding-math on a per function basis > (possibly using pragma fenv_access), so the optimization of what happens > to globals doesn't seem so important. > > >> Side remark, I am sad that Intel added rounded versions for scalars and > >> 512 bit vectors but not for intermediate sizes, while I am most > >> interested in 128 bits. Masking most of the 512 bits still causes the > >> dreaded clock slow-down. > > > > Ick. I thought this was vector-length agnostic... > > I think all of the new stuff in AVX512 is, except rounding... > > Also, the rounded functions have exceptions disabled, which may make > them hard to use with fenv_access. > > >>> I guess builtins need the same treatment for -ftrapping-math as they > >>> do for -frounding-math. I think you already mentioned the default > >>> of this flag doesn't make much sense (well, the flag isn't fully > >>> honored/implemented). > >> > >> PR 54192 > >> (coincidentally, it caused a missed vectorization in > >> https://stackoverflow.com/a/56681744/1918193 last week) > > > > I commented there. Lets just make -frounding-math == FENV_ACCESS ON > > and keep -ftrapping-math as whether FP exceptions raise traps. > > One issue is that the C pragmas do not let me convey that I am interested > in dynamic rounding but not exception flags. It is possible to optimize > quite a bit more with just rounding. In particular, the functions are pure > (at some point we will have to teach the compiler the difference between > the FP environment and general memory, but I'd rather wait). > > > Yeah. Auto-vectorizing would also need adjustment of course (also > > costing like estimate_num_insns or others). > > Anything that is only about optimizing the code in -frounding-math > functions can wait, that's the good point of implementing a new feature.
Sure - the only thing we may want to avoid is designing us into a corner we cannot easily escape from. Whenever I thought about -frounding-math and friends (and not doing asm()-like hacks ;)) I thought we need to make the data dependence on the FP environment explicit. So I'd have done { FP result, new FP ENV state } = FENV_PLUS (op1, op2, old FP ENV state); with the usual caveat of representing multiple return values. Our standard way via a projection riding ontop of _Complex types works as long as you use scalars and matching types, a more general projection facility would use N-uples of abitrary component types (since those are an implementation detail). My usual alternative was (ab-)using asm()s since those can have multiple outputs and provide internal-function like asm-body IDs more-or-less directly mapping to RTL instructions for example. With using global memory as FENV state you use virtual operands for this. And indeed for -frounding-math the operations itself do not change the FP environment (thus are pure) and the memory approach looks easiest (it's already implemented this way for builtins). Given the pace of improving -frounding-math support in the past I think it's fine to continue in this direction. Richard. > -- > Marc Glisse