Re: making the new if-converter not mangle IR that is already vectorizer-friendly [from Abe Fri. 2015-July-10 ~4:25pm US Central time, same date ~9:25pm UTC: responses to Richard, comments on: vectorization, if conversion, and optimization in general]

Abe Fri, 10 Jul 2015 14:40:17 -0700

The GIMPLE level if-conversion code was purely
written  to make loops suitable for vectorization.


I`m not surprised to read that.

It wasn't meant to provide if-conversion of
scalar code in the end (even though it does).


Serendipity sure is nice.  ;-)

We've discussed enabling the versioning path unconditionally for example.


It might make sense even without vectorization, for target arch.s with [1] 
either a wide range of
predicated instructions or at least [a] usable "cmove-like" instruction[s] 
_and_ [2] the target either
_never_ runs off battery or only extremely-rarely runs off battery.  When running off 
"wall outlet"
electricity [and not caring about the electric bill ;-)], wasted speculation is 
[usually?] just
wasted energy that didn`t cost any extra time b/c the CPU/core would have been 
idle otherwise.
When running off battery, the wasted energy _can_ be unacceptable, but is not 
_necessarily_ so:
it depends on the customer/programmer/user`s priorities, esp. vis-a-vis 
execution speed vs.
amount of time a single charge allows the machine to run.

The preceding makes me wonder: has anybody considered adding an optimization 
profile for GCC,
to add to the set {"-O"..."-O3", "-Ofast", "-Os"}, that optimizes for the 
amount of energy
consumed? I don`t remember reading about anything like that in relation to 
compiler research,
but perhaps somebody reading this _has_ seen [or done!] something related and 
would kindly reply.
Obviously, this is not an easy thing to figure out, since in _most_ cases 
finishing the job
sooner -- i.e. running faster -- means less energy spent computing the job than 
would have
otherwise been the case, but this is not _always_ true: for example, 
speculative execution
that has a 50% probability of being wasteful instead of just idling in a 
low-power state.

So if the new scheme with scratch-pads produces more "correct" code but code

> that will known to fail vectorization then it's done at the wrong place -
> because the whole purpose of GIMPLE if-conversion is to enable more 
vectorization.

I think I understand, and I agree.  The purpose of this pass is enable more
vectorization — the recently-reported fact that it can also enable more 
cmove-style
non-vectorized code can also be beneficial, but is not the main objective.

The main benefit of the new if converter is not vs. "GCC without any if 
conversion",
but rather is vs. the _old_ if converter.  The old one can, in some cases,
produce code that e.g. dereferences a null pointer when the same program given
the same inputs would have not done so without the if-conversion "optimization".
The new converter reduces/eliminates this problem.  Therefor, my current
main goal is to eliminate the performance regressions that are not spurious
[e.g. are not a direct result of the old conversion being unsafe],
so that the new converter can be merged to trunk and also enabled implicitly
by "-O3" for autovectorization-enabled arch.es, which the old converter
AFAIK was _not_ [due to the aforementioned safety issues].

In other words, the old if converter was like a sharp knife with a very small
handle: usable by experts, but dangerous for people with little knowledge of
the run-time properties of the code [e.g. will a pointer ever be null?] who
just want to pass in "-O3" and have the code run faster without much thinking.
A typical GCC user: "This code runs fine when compiled with ''-O1''
and ''-O2'', so with ''-O3'' it should also be fine, only faster!"

IMO, only those flags that _explicitly_ request unsafe transformations should 
be allowed
to cause {source code that runs perfectly when compiled with a low optimization 
setting}
to be compiled to code that may crash or may compute a different result than 
under a
low-optimization setting [e.g. compiling floating-point code such that the 
executable
ignores NaNs or equates denorms with zero] even when given the same inputs as a
non-crashing correct-result-producing less-optimized build of the same source.
AFAIK this is in accordance with GCC`s philosophy, which explains why the old
if converter was not enabled by default. The _new_ if converter, OTOH, is safe
enough to enable by default under "-O3", and should be beneficial for targets
that support vector operations and for which the autovectorizer is successful
in generating vector code.  Those are probably the main reasons why the new
converter is worth hacking on to get it into shape, performance-regression-wise.
Plus, this is work that my employer [Samsung] is willing and able
to fund at this time [by paying my salary while I work on it ;-)].

Regards,

Abe

Re: making the new if-converter not mangle IR that is already vectorizer-friendly [from Abe Fri. 2015-July-10 ~4:25pm US Central time, same date ~9:25pm UTC: responses to Richard, comments on: vectorization, if conversion, and optimization in general]

Reply via email to