The GIMPLE level if-conversion code was purely written to make loops suitable for vectorization.
I`m not surprised to read that.
It wasn't meant to provide if-conversion of scalar code in the end (even though it does).
Serendipity sure is nice. ;-)
We've discussed enabling the versioning path unconditionally for example.
It might make sense even without vectorization, for target arch.s with [1] either a wide range of predicated instructions or at least [a] usable "cmove-like" instruction[s] _and_ [2] the target either _never_ runs off battery or only extremely-rarely runs off battery. When running off "wall outlet" electricity [and not caring about the electric bill ;-)], wasted speculation is [usually?] just wasted energy that didn`t cost any extra time b/c the CPU/core would have been idle otherwise. When running off battery, the wasted energy _can_ be unacceptable, but is not _necessarily_ so: it depends on the customer/programmer/user`s priorities, esp. vis-a-vis execution speed vs. amount of time a single charge allows the machine to run. The preceding makes me wonder: has anybody considered adding an optimization profile for GCC, to add to the set {"-O"..."-O3", "-Ofast", "-Os"}, that optimizes for the amount of energy consumed? I don`t remember reading about anything like that in relation to compiler research, but perhaps somebody reading this _has_ seen [or done!] something related and would kindly reply. Obviously, this is not an easy thing to figure out, since in _most_ cases finishing the job sooner -- i.e. running faster -- means less energy spent computing the job than would have otherwise been the case, but this is not _always_ true: for example, speculative execution that has a 50% probability of being wasteful instead of just idling in a low-power state.
So if the new scheme with scratch-pads produces more "correct" code but code
> that will known to fail vectorization then it's done at the wrong place - > because the whole purpose of GIMPLE if-conversion is to enable more vectorization. I think I understand, and I agree. The purpose of this pass is enable more vectorization — the recently-reported fact that it can also enable more cmove-style non-vectorized code can also be beneficial, but is not the main objective. The main benefit of the new if converter is not vs. "GCC without any if conversion", but rather is vs. the _old_ if converter. The old one can, in some cases, produce code that e.g. dereferences a null pointer when the same program given the same inputs would have not done so without the if-conversion "optimization". The new converter reduces/eliminates this problem. Therefor, my current main goal is to eliminate the performance regressions that are not spurious [e.g. are not a direct result of the old conversion being unsafe], so that the new converter can be merged to trunk and also enabled implicitly by "-O3" for autovectorization-enabled arch.es, which the old converter AFAIK was _not_ [due to the aforementioned safety issues]. In other words, the old if converter was like a sharp knife with a very small handle: usable by experts, but dangerous for people with little knowledge of the run-time properties of the code [e.g. will a pointer ever be null?] who just want to pass in "-O3" and have the code run faster without much thinking. A typical GCC user: "This code runs fine when compiled with ''-O1'' and ''-O2'', so with ''-O3'' it should also be fine, only faster!" IMO, only those flags that _explicitly_ request unsafe transformations should be allowed to cause {source code that runs perfectly when compiled with a low optimization setting} to be compiled to code that may crash or may compute a different result than under a low-optimization setting [e.g. compiling floating-point code such that the executable ignores NaNs or equates denorms with zero] even when given the same inputs as a non-crashing correct-result-producing less-optimized build of the same source. AFAIK this is in accordance with GCC`s philosophy, which explains why the old if converter was not enabled by default. The _new_ if converter, OTOH, is safe enough to enable by default under "-O3", and should be beneficial for targets that support vector operations and for which the autovectorizer is successful in generating vector code. Those are probably the main reasons why the new converter is worth hacking on to get it into shape, performance-regression-wise. Plus, this is work that my employer [Samsung] is willing and able to fund at this time [by paying my salary while I work on it ;-)]. Regards, Abe