Thank you for your comments.

On Fri, 7 Aug 2020, Richard Biener wrote:

Conversions look like
.FENV_CONVERT (arg, (target_type*)0, 0)
the pointer is there so we know the target type, even if the lhs
disappears at some point. The last 0 is the same as for all the others, a
place to store options about the operation (do we care about rounding,
about exceptions, etc), it is just a placeholder for now. I could rename
it to .FENV_NOP since we seem to generate NOP usually, but it looked
strange to me.

You could carry the info in the existing flags operand if you make that a
pointer ...

Ah, true, I forgot that some other trees already use this kind of trick.
Not super pretty, but probably better than an extra argument.

Adding some info missing above from reading the patch.

The idea seems to be to turn FP operations like PLUS_EXPR, FLOAT_EXPR
but also (only?) calls to BUILT_IN_SQRT to internal functions named
IFN_FENV_* where the internal function presumably has some extra
information.

Sqrt does seem to have a special place in IEEE 754, and in practice some
targets have instructions (with rounding) for it.

You have

+/* float operations with rounding / exception flags.  */
+DEF_INTERNAL_FN (FENV_PLUS, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_MINUS, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_MULT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_RDIV, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_FLOAT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_CONVERT, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (FENV_SQRT, ECF_LEAF | ECF_NOTHROW, NULL)

so with -fnon-call-exceptions they will not be throwing (but regular
FP PLUS_EXPR would).

Hmm, ok, I guess I should remove ECF_NOTHROW then, the priority should
be to be correct, we can carefully reintroduce optimizations later.

They will appear to alter memory state - that's probably to have the
extra dependence on FENV changing/querying operations but then why do you
still need to emit asm()s?

The IFNs are for GIMPLE and represent the operations, while the asm are simple passthrough for RTL, I replace the first with the second (plus the regular operation) at expansion.

I suppose the (currently unused) flags parameter could be populated with
some known FP ENV state and then limited optimization across stmts
with the same non-zero state could be done?

I was mostly thinking of storing information like:
* don't care about the rounding mode for this operation
* may drop exceptions produced by this operation
* may produce extra exceptions
* don't care about signed zero
* may contract into FMA
* don't care about errno (for sqrt?)
etc

With fenv_round, we would actually have to store the rounding mode of
the operation (upward, towards-zero, dynamic, don't-care, etc), a bit
less nice because 0 is not a safe fallback anymore. We could also store
it when we detect a call to fesetround before, but we have to be careful
that this doesn't result in even more calls to fesetround at expansion
for targets that do not have statically rounded operations.

If there are other, better things to store there, great.

Using internal function calls paints us a bit into a corner since they are still
subject to the single-SSA def restriction in case we'd want to make FENV
dataflow more explicit.  What's the advantage of internal functions compared
to using asms for the operations themselves if we wrap this class into
a set of "nicer" helpers?

I wanted the representation on gimple to look a bit nice so it would be both easy to read in the dumps, and not too hard to write optimizations for, and a function call looked good enough. Making FENV dataflow explicit would mean having PHIs for FENV, etc? At most I thought FENV would be represented by one specific memory region which would not alias user variables of type float or double, in particular.

I don't really see what it would look like with asms and helpers. In some sense, the IFNs are already wrappers, that we unwrap at expansion. Your asms would take some FENV as intput and output, so we have to track what FENV to use where, similar to .MEM.

One complication with tracking data-flow is "unknown" stuff, I'd suggest
to invent a mediator between memory state and FP state which would
semantically be load and store operations of the FP state from/to memory.

All I can think of is make FP state a particular variable in memory, and teach alias analysis that those functions only read/write to this variable. What do you have in mind, splitting operations as:

fenv0 = read_fenv()
(res, fenv1) = oper(arg0, arg1, fenv0)
store_fenv(fenv1)

so that "oper" itself is const? (and hopefully simplify consecutive read_fenv/store_fenv so there are fewer of them) I wonder if lying about the constness of the operation may be problematic.

(and asm would be abused as a way to return a pair, with hopefully some marker so we know it isn't a real asm)

That said, you're the one doing the work and going with internal functions
is reasonable - I'm not sure to what extent optimization for FENV acccess
code will ever be possible (or wanted/expected).  So going more precise
might not have any advantage.

I think some optimizations are expected. For instance, not having to re-read the same number from memory many times just because there was an addition in between (which could write to fenv but that's it). Some may still want FMA (with a consistent rounding direction). For those (like me) who usually only care about rounding and not exceptions, making the operations pure would be great, and nothing says we cannot vectorize those rounded operations!

I am trying to be realistic with what I can achieve, but if you think the IFNs would paint us into a corner, then we can drop this approach.

You needed to guard SQRT - will you need to guard other math functions?
(round, etc.)

Maybe, but probably not many. I thought I might have to guard all of them (sin, cos, etc), but IIRC Joseph's comment seemed to imply that this wouldn't be necessary. I am likely missing FMA now...

If we need to keep the IFNs use memory state they will count towards
walk limits of the alias oracle even if they can be disambiguated against.
This will affect both compile-time and optimizations.

Yes...

+  /* Careful not to end up with something like X - X, which could get
+     simplified.  */
+  if (!skip0 && already_protected (op1))

we're already relying on RTL not optimizing (x + 0.5) - 0.5 but since
that would involve association the simple X - X case might indeed
be optimized (but wouldn't that be a bug if it is not correct?)

Indeed we do not currently simplify X-X without -ffinite-math-only. However, I am trying to be safe, and whether we can simplify or not is something that depends on each operation (what the pragma said at that point in the source code), while flag_finite_math_only is at best per function.

--
Marc Glisse

Reply via email to