On Tue, Jul 21, 2015 at 10:19 PM, Abe <abe_skol...@yahoo.com> wrote: > First: my apologies for the delay in this reply.
Likewise ;) > > [Richard wrote:] >> >> Well, but we do have a pretty strong if-converter on RTL > >> which has access to target specific information. > > Yes, I have had a quick look at it. It looks quite thorough. > > I think I see that you [Richard] are implying that the if converter > at the GIMPLE level should not be trying to do all the if-conversion > work that could possibly be done. I agree with that. However, > AFAIK the RTL work is done strictly after the autovectorization, > so any if conversion that is strictly for the benefit of > autovectorization must be done before autovectorization > and therefor at the GIMPLE level. Corrections are welcome. Yes, the GIMPLE if-converter should strictly be a vectorization enabler. If the "new" if-converter produces code that the vectorizer does not handle (on the target targeted) then it has done a bad job. > > [Abe wrote:] >>> >>> The preceding makes me wonder: has anybody considered adding an >>> optimization >>> profile for GCC, […] that optimizes for the amount of energy consumed? > >>> I don`t remember reading about anything like that […] > > [Richard wrote:] >> >> I think there were GCC summit papers/talks about this. > > > Thanks, but can you please be more specific? > > After writing the message quotes above as "[Abe wrote:]" > I found 2 or 3 papers about compiling code with an eye > towards energy efficiency, but not a whole hell of a lot, > and I didn`t yet find anything GCC-specific on this topic. > > > [Abe wrote:] >>> >>> The old one can, in some cases, produce code that >>> e.g. dereferences a null pointer when the same >>> program given the same inputs would have not > >>> done so without the if-conversion "optimization". > > [Richard wrote:] >> >> Testcase? I don't think it can and if it can this bug needs to be fixed. > > > With the program below my sign-off, using stock GCC 4.9.2, even under "-O3" > the code compiles and runs OK, even when using "-ftree-loop-if-convert", > which is I guess what you [Richard] meant by a recent comment to me [Abe]. > Sebastian confirmed in person that even the old if converter did things > differently > even to loads when GCC is invoked with the full > "-ftree-loop-if-convert-stores" > flag but without "-ftree-loop-if-convert-stores". With the old converter, > compiling with "-ftree-loop-if-convert-stores" yields a program that > segfaults due > to dereferencing a null pointer [it would deref. a _lot_ of them if it > _could_ ;-)]. > > [tested using GCC 4.9.2 on both Cygwin (64-bit) for Windows 7, AMD64 > compilation, > and Mac OS X 10.6.8 -- also using GCC 4.9.2 -- compiling for both ia32 and > AMD64] > > I intend to adapt the test case to DejaGNU format and add > it to the codebase from which the patch is being generated. I think I fixed this bug. It was because of an inverted test. 2015-07-10 Richard Biener <rguent...@suse.de> PR tree-optimization/66823 * tree-if-conv.c (memrefs_read_or_written_unconditionally): Fix inverted predicate. > > [Abe wrote:] >>> >>> The new converter reduces/eliminates this problem. > > > [Richard wrote:] >> >> You mean the -ftree-loop-if-convert-stores path. > > > The old converter apparently only produced code with the aforementioned > crashing problem only when "-ftree-loop-if-convert-stores" is/was in use, > yes. Due to a bug. It was never supposed to do invalid transforms. It was known to produce store data-races with -ftree-loop-if-convert-stores. That is something the scratchpad use avoids (at the expense of a lot(?) of vectorization). > The new one should not be producing code with that problem > regardless of "-ftree-loop-if-convert-stores" or lack thereof. > > I think the reason for the confusing ambiguity is that since the old > converter > did conversion of stores in a way that was thread-unsafe for half hammocks > [e.g. C source code like "if (condition) A[a] = something;" with no > attached > "else", assuming "condition" and "something" are both conversion-friendly], > somebody used "-ftree-loop-if-convert-stores" to mean "if-convert as much > as possible even if doing so requires converting unsafely". In the new > converter we have no such unsafety TTBOMK, so I propose to remove that flag. First of all to fix regressions on the load if-conversion side you should stop using a scratchpad for loads (do you have one? I seem to remember you do from the design description). load data-races are only theoretical. Then you should IMHO keep the old store path for --param allow-store-data-races=1. I'd like to see a comparison in the number of vectorized loops for SPEC CPU 2006 FP with -Ofast -ftree-loop-if-convert-stores with / without the new if-converter (and if you like also without -ftree-loop-if-convert-stores). You can use -fopt-info-vec and grep for the number of 'loop vectorized' cases. Thanks, Richard. > Regards, > > Abe > > > > > > Makefile > ======== > all: foo_______if-converted foo_______if-converted_with_stores_flag > foo___NOT_if-converted > > foo_______if-converted_with_stores_flag: foo.c > gcc -std=c99 -O3 -ftree-loop-if-convert-stores foo.c -o > foo_______if-converted_with_stores_flag > > foo_______if-converted: foo.c > gcc -std=c99 -O3 -ftree-loop-if-convert foo.c -o > foo_______if-converted > > foo___NOT_if-converted: foo.c > gcc -std=c99 -O3 foo.c -o > foo___NOT_if-converted > > > > > foo.c > ===== > /* intentionally not defining "SIZE" here: > pretending that "choose" is compiled separately from "main" > */ > > /* a "controlled copy-paste" that takes 4 arrays, 2 of which are full of > pointers, > and for each index deref.s one of the 2 pointers and shoves the result in > one > of the arrays, based on the value in the respective index of the > remaining array > */ > > void __attribute__((noinline)) /* forcing no inlining b/c inlining > theoretically allow > sufficient analysis to allow the optimizer > to "see" > that "cond_array[...]" is full of nothing > but 0s > */ > choose(char* cond_array, > short ** pointer_array_1, > short ** pointer_array_2, > short * output_array, > unsigned long long len) { > > for (unsigned long long index = 0; index < len; ++index) > if (cond_array[index]) > output_array[index] = *pointer_array_1[index]; > else > output_array[index] = *pointer_array_2[index]; > > } > > > > #define SIZE 9 > > int main() { > > char condition[SIZE]; > short data_in [SIZE]; > short data_out[SIZE]; > short * pointer_array_1[SIZE]; > short * pointer_array_2[SIZE]; > > for (unsigned short index = 0; index < SIZE; ++index) { > condition [index] = 0; /*** all false ***/ > pointer_array_1[index] = 0; /*** all null pointers ***/ > pointer_array_2[index] = &data_in[index]; /*** all good pointers ***/ > } > > choose(&condition[0], &pointer_array_1[0], &pointer_array_2[0], > &data_out[0], SIZE); > > } > > > > > > > > result from running MacOSX/ia32 compilation under GDB 6.3.50* > ============================================================= > Program received signal EXC_BAD_ACCESS, Could not access memory. > Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 > 0x00001e2b in choose () > > > > result from running MacOSX/AMD64 compilation under GDB 6.3.50* > ============================================================== > Program received signal EXC_BAD_ACCESS, Could not access memory. > Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000 > 0x0000000100000e90 in choose () > > > > * Apple-supplied version of GDB, identified as: > > GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48 > UTC 2011)