On Mon, 24 Jun 2019, Richard Biener wrote:
On Sun, Jun 23, 2019 at 12:22 AM Marc Glisse <marc.gli...@inria.fr> wrote:
On Sat, 22 Jun 2019, Richard Biener wrote:
On June 22, 2019 6:10:15 PM GMT+02:00, Marc Glisse <marc.gli...@inria.fr> wrote:
Hello,
as discussed in the PR, this seems like a simple enough approach to
handle
FENV functionality safely, while keeping it possible to implement
optimizations in the future.
Some key missing things:
- handle C, not just C++ (I don't care, but some people probably do)
As you tackle C++, what does the standard say to constexpr contexts and
FENV? That is, what's the FP environment at compiler - time (I suppose
FENV modifying functions are not constexpr declared).
The C++ standard doesn't care much about fenv:
[Note: This document does not require an implementation to support the
FENV_ACCESS pragma; it is implementation-defined (15.8) whether the pragma
is supported. As a consequence, it is implementation- defined whether
these functions can be used to test floating-point status flags, set
floating-point control modes, or run under non-default mode settings. If
the pragma is used to enable control over the floating-point environment,
this document does not specify the effect on floating-point evaluation in
constant expressions. — end note]
Oh, I see.
We should care about the C standard, and do whatever makes sense for C++
without expecting the C++ standard to tell us exactly what that is. We can
check what visual studio and intel do, but we don't have to follow them.
This makes it somewhat odd to implement this for C++ first and not C, but hey ;)
Well, I maintain a part of CGAL, a C++ library, that uses interval
arithmetic and thus relies on a non-default rounding direction. I am
trying to prepare this dog food so I can eat it myself...
-frounding-math is supposed to be equivalent to "#pragma stdc fenv_access
on" covering the whole program.
For constant expressions, I see a difference between
constexpr double third = 1. / 3.;
which really needs to be done at compile time, and
const double third = 1. / 3.;
which will try to evaluate the rhs as constexpr, but where the program is
still valid if that fails. The second one clearly should refuse to be
evaluated at compile time if we are specifying a dynamic rounding
direction. For the first one, I am not sure. I guess you should only write
that in "fenv_access off" regions and I wouldn't mind a compile error.
Note that C2x adds a pragma fenv_round that specifies a rounding direction
for a region of code, which seems relevant for constant expressions. That
pragma looks hard, but maybe some pieces would be nice to add.
Hmm. My thinking was along the line that at the start of main() the
C abstract machine might specify the initial rounding mode (and exception
state) is implementation defined and all constant expressions are evaluated
whilst being in this state. So we can define that to round-to-nearest and
simply fold all constants in contexts we are allowed to evaluate at
compile-time as we see them?
There are way too many such contexts. In C++, any initializer is
constexpr-evaluated if possible (PR 85746 shows that this is bad for
__builtin_constant_p), and I do want
double d = 1. / 3;
to depend on the dynamic rounding direction. I'd rather err on the other
extreme and only fold when we are forced to, say
constexpr double d = 1. / 3;
or even reject it because it is inexact, if pragmas put us in a region
with dynamic rounding.
I guess fenv_round aims at using a pragma to change the rounding mode?
Yes. You can specify either a fixed rounding mode, or "dynamic". In the
first case, it overrides the dynamic rounding mode.
- handle vectors (for complex, I don't know what it means)
Then flag_trapping_math should also enable this path, meaning that we
should stop making it the default, or performance will suffer.
Do we need N variants of the functions to really encode FP options into
the IL and thus allow inlining of say different signed-zero flag
functions?
Not sure what you are suggesting. I am essentially creating a new
tree_code (well, an internal function) for an addition-like function that
actually reads/writes memory, so it should be orthogonal to inlining, and
only the front-end should care about -frounding-math. I didn't think about
the interaction with signed-zero. Ah, you mean
IFN_FENV_ADD_WITH_ROUNDING_AND_SIGNED_ZEROS, etc?
Yeah. Basically the goal is to have the IL fully defined on its own, without
having its semantic depend on flag_*.
The ones I am starting
from are supposed to be safe-for-everything. As refinement, I was thinking
in 2 directions:
* add a third constant argument, where we can specify extra info
* add a variant for the case where the function is pure (because I expect
that's easier on the compiler than "pure if (arg3 & 8) != 0")
I am not sure more variants are needed.
For optimization having a ADD_ROUND_TO_ZERO (or the extra params
specifying an explicit rounding mode) might be interesting since on x86
there are now instructions with rounding mode control bits.
Yes. Pragma fenv_round would match well with that. On the other hand, it
would be painful for platforms that do not have such instructions, forcing
to generate plenty of fe[gs]etround, and probably have a pass to try and
reduce their number.
Side remark, I am sad that Intel added rounded versions for scalars and
512 bit vectors but not for intermediate sizes, while I am most
interested in 128 bits. Masking most of the 512 bits still causes the
dreaded clock slow-down.
Also, while rounding clearly applies to an operation, signed-zero kind of
seems to apply to a variable, and in an operation, I don't really know if
it means that I can pretend that an argument of -0. is +0. (I can return
+inf for 1/-0.) or if it means I can return 0. when the operation should
return -0.. Probably both... If we have just -fsigned-zeros but no
rounding or trapping, the penalty of using an IFN would be bad. But indeed
inlining functions with different -f(no-)signed-zeros forces to use
-fsigned-zeros for the whole merged function if we don't encode it in the
operations. Hmm
Yeah. I guess we need to think about each and every case and how
to deal with it. There's denormals and flush-to-zero (not covered by
posix fenv modification IIRC) and a lot of math optimization flags
that do not map to FP operations directly...
If we really try to model all that, at some point we may as well remove
PLUS_EXPR for floats...
.FENV_PLUS (x, y, flags)
where flags is a bitfield that specifies if we care about signed zeros,
signalling NaNs, what the rounding is (dynamic, don't care, up, down,
etc), if we care about exceptions, if we can do unsafe optimizations, if
we can contract +* into fma, etc. That would force to rewrite a lot of
optimizations :-(
And CSE might become complicated with several expressions that differ only
in their flags.
.FENV_PLUS (x, y) was supposed to be equivalent to .FENV_PLUS (x, y,
safeflags) where safeflags are the strictest flags possible, while leaving
existing stuff like -funsafe-math-optimizations alone (so no regression),
with the idea that the version with flags would come later.
I didn't look at the patch but I suppose you rely on RTL to not do code
motion across FENV modifications and not fold Constants?
No, I rely on asm volatile to prevent that, as in your recent hack, except
that the asm only appears near expansion. I am trying to start from
something safe and refine with optimizations, no subtlety.
Ah, OK. So indeed instead of a new pass doing the lowering on GIMPLE
this should ideally be done by populating expand_FENV_* appropriately.
Yes, I was lazy because it means I need to understand better how expansion
works :-(
That is, don't we really need unspec_volatile variant patterns for the
Operations?
Yes. One future optimization (that I listed in the PR) is to let targets
expand those IFN as they like (without the asm barriers), using some
unspec_volatile. I hope we can get there, although just letting targets
replace "=g" with whatever in the asm would already get most of the
benefits.
I just thought of one issue for vector intrinsics, say _mm_add_pd, where
the fenv_access status that should matter is that of the caller, not the
one in emmintrin.h. But since I don't have the pragma or vectors, that can
wait.
True. I guess for the intrinsic headers we could invent some new attribute
(or assume such semantics for always_inline which IIRC they are) saying
that a function inherits options from the caller (difficult if not
inlined, it would
imply cloning, thus always-inline again...).
On the patch I'd name _DIV _RDIV (to match the tree code we are dealing
with). You miss _NEGATE
True. I am only interested in -frounding-math, so my first reaction was
that I don't need to do anything for NEGATE, but indeed with a signalling
NaN anything can have an effect.
and also the _FIX_TRUNC and _FLOAT in case those might trap with
-ftrapping-math.
I don't know much about fixed point, and I didn't think about conversions
yet. I'll have to check what the C standard says about those.
There are also internal functions for POW, FMOD and others which are
ECF_CONST but may not end up being folded from their builtin
counter-part with -frounding-math.
I don't know how far this needs to go. SQRT has correctly rounded
instructions on several targets, so it is relevant. But unless your libm
provides a correctly-rounded implementation of pow, the compiler could
also ignore it. The new pragma fenv_round is scary in part because it
seems to imply that all math functions need to have a correctly rounding
implementation.
I guess builtins need the same treatment for -ftrapping-math as they
do for -frounding-math. I think you already mentioned the default
of this flag doesn't make much sense (well, the flag isn't fully
honored/implemented).
PR 54192
(coincidentally, it caused a missed vectorization in
https://stackoverflow.com/a/56681744/1918193 last week)
So I think the patch is a good start but I'd say we should not introduce
the new pass but instead expand to the asm() kludge directly which
would make it also easier to handle some ops as unspecs in the target.
This also answers what should be done with vectors, I'll need to add code
to tree-vect-generic for the new functions.
--
Marc Glisse