Re: FENV_ACCESS status

Marc Glisse Fri, 07 Aug 2020 09:46:28 -0700

On Fri, 7 Aug 2020, Richard Biener wrote:

I was mostly thinking of storing information like:
* don't care about the rounding mode for this operation
* may drop exceptions produced by this operation
* may produce extra exceptions
* don't care about signed zero
* may contract into FMA
* don't care about errno (for sqrt?)
etc


So we could leverage the same mechanism for inlining a non-ffast-math
function into a -ffast-math function, rewriting operations to IFNs?


Yes.

Though the resulting less optimization might offset any benefit we getfrom the inlining...

I was hoping enough optimizations would still be possible. With the rightflags, the function could be marked pure or const, could be vectorized,etc. We could go through the transformations in match.pd and copy each onefor the IFN, checking the relevant set of flags (although they might needto be more manual in forwprop if match.pd cannot handle them).

At least the above list somewhat suggests it want's to capture thevarious -f*-math options.

Originally I only wanted rounding and exceptions, but this looked like asensible generalization after a previous discussion.

One complication with tracking data-flow is "unknown" stuff, I'd suggest
to invent a mediator between memory state and FP state which would
semantically be load and store operations of the FP state from/to memory.


All I can think of is make FP state a particular variable in memory, and
teach alias analysis that those functions only read/write to this
variable. What do you have in mind, splitting operations as:

fenv0 = read_fenv()
(res, fenv1) = oper(arg0, arg1, fenv0)
store_fenv(fenv1)

so that "oper" itself is const? (and hopefully simplify consecutive
read_fenv/store_fenv so there are fewer of them) I wonder if lying about
the constness of the operation may be problematic.


Kind-of.  I thought to do this around "unknown" operations like function
calls only:

store_fenv(fenv0);
foo ();
fenv0 = read_fenv();

In what I described a few lines above, that's roughly what would remainafter simplification, but instead you would generate it directly, savingsome compile time if there are more floating point operations thanunknown. It may help to add them also for branch/join. And even then itmay not be sufficient. If 2 branches start with read_fenv or end withstore_fenv, we don't want an optimizer to move them into a single calloutside of the branches, because then the operation itself, being const,could move outside of the branch. ISTR that there are ways to avoid thiskind of transformation (mostly meant to avoid duplicating an inline asmcontaining a hardcoded label).

At expansion, I guess read_fenv/store_fenv would expand to nothing, theywere mostly there to protect the true operation, and we could still expand

(res, fenv1) = oper(arg0, arg1, fenv0)
to
res=asm_hide(oper(asm_hide(arg0),asm_hide(arg1)))

if we don't want to also model things in RTL for every target (at least tobegin with).

I guess there's nothing else but to try ...

Suppose for example you have

_3 = .IFN_PLUS (_1, _2, 0);
_4 = .IFN_PLUS (_1, _2, 0);

the first plus may alter FP state (set inexact) but since the second plus
computes the same value we'd want to elide it(?).


Assuming there is nothing in between, I think so, yes.

Now if there's a feclearexcept() inbetween we can't elide it - and thatworks as proposed because the memory state is inspected byfeclearexcept().

The exact effect of feclearexcept depends on how we model things. It couldbe considered write-only. If the argument is FE_ALL_EXCEPT, things mayalso be easier.


In some cases, with

_3 = .IFN_PLUS (_1, _2, 0);
feclearexcept (...);
_4 = .IFN_PLUS (_1, _2, 0);

we may want to elide the first IFN...

But I can't see how we can convince FRE that we can elide the secondplus when both are modifying memory.


Yes, that's certainly harder.

Actually, for optimization purposes, I would distinguish the case where wecare about exceptions and the case where we don't. The few times I've usedexceptions, it was only for a single operation, and I didn't expect anyoptimization. On the other hand, I often use hundreds of roundedoperations where I don't care about exceptions. Those can be marked aspure (I expect querying if .FENV_PLUS is pure to involve looking at a bitin its last argument), and would fit much more easily with the currentoptimizations. I can't claim that my uses are representative of all usesthough, some people may do long, regular computations and trap onFE_INVALID...

I am not that interested in exceptions, but since just rounding does notmatch a standard feature, it seemed more sensible to handle both together.I did wonder about making 2 sets of functions, the ones with exceptions(much harder for optimization, although not completely hopeless if peopleare really motivated) and the pure ones without exceptions, so the firstwouldn't hinder the second too much. But having the strictest versionfirst looked reasonable.

There's no such thing currently as effects on memory state only dependon arguments.

This reminds me of the initialization of static/thread_local variables infunctions, when Jason tried to add an attribute, but I don't think it wasever committed, and the semantics were likely too different.

I _think_ we don't have to say the mem out state depends on the mem instate (FP ENV), well - it does, but the difference only depends on theactual arguments.


A different rounding mode could cause different exceptions I believe.

That said, tracking FENV together with memory will complicate things
but explicitely tracking an (or multiple?) extra FP ENV register input/output
makes the problem not go away (the second plus still has the mutated
FP ENV from the first plus as input).  Instead we'd have to separately
track the effect of a single operation and the overall FP state, like

(_3, flags_5) = .IFN_PLUS (_1, _2, 0);
fpexstate = merge (flags_5, fpexstate);
(_4, flags_6) = .IFN_PLUS (_1, _2, 0);
fpexstate = merge (flage_6, fpexstate);

We would have to be careful that lines 2 and 3 cannot be swapped (unlesswe keep all the merges and key expansion on those and not on the IFN?But we may end up with a use of the sum before the merge).

or so and there we can CSE.


And I guess we would have a transformation
merge(f, merge(f, state)) --> merge(f, state)

We have to track exception state separately
from the FP control word for rounding-mode for this to work.  Thus when
we're not interested in the exception state then .IFN_PLUS would be 'pure'
(only dependent on the FP CW)?

So I guess we should think of somehow separating rounding mode tracking
and exception state?  If we make the functions affect memory anyway
we can have the FP state reg(s) modeled explicitely with a fake decl(s) and pass
that by reference to the IFNs?  Then we can make use of the "fn spec" attribute
to tell which function reads/writes which reg.  Across unknown functions we'd
then have to use the store/load "trick" to merge them with the global
memory state though.

Splitting the rounding mode from the exceptions certainly makes sense,since they are used quite differently.



_3 = .FENV_PLUS (_1, _2, 0, &fenv_round, &fenv_except)
or just
_3 = .FENV_PLUS (_1, _2, 1, &fenv_round, 0)

or_3 = .FENV_PLUS (_1, _2, 2, 0, &fenv_except)

when we are not interested in everything.

with fake global decls for fenv_round and fenv_except (so "unknown"already possibly reads/writes it) and fn specs to say it doesn't look atother memory? I was more thinking of making that implicit, through magicin a couple relevant functions (the value in flags says if the globalfenv_round or fenv_except is accessed), as a refinement of just "memory".

But IIUC, we would need something that does not use memory at all (noteven one variable) if we wanted to avoid the big penalty in aliasanalysis, etc.


If we consider the case without exceptions:

round = get_fenv_round()
_3 = .FENV_PLUS (_1, _2, opts, round)

with .FENV_PLUS "const" and get_fenv_round "pure" (or even reading roundfrom a fake global variable instead of a function call) would be tempting,but it doesn't work, since now .FENV_PLUS can migrate after a later callto fesetround. Even without exceptions we need some protection after, soit may be easier to keep the memory (fenv) read as part of .FENV_PLUS.

Also, caring only about rounding doesn't match any standard #pragma, sosuch an option may see very little use in practice...

Sorry for the incoherent brain-dump above ;)


It is great to have someone to discuss this with!

--
Marc Glisse

Re: FENV_ACCESS status

Reply via email to