On Tue, Jul 21, 2015 at 10:19 PM, Abe <abe_skol...@yahoo.com> wrote:
> First: my apologies for the delay in this reply.

Likewise ;)

>
> [Richard wrote:]
>>
>> Well, but we do have a pretty strong if-converter  on RTL
>
>> which has access to target specific information.
>
> Yes, I have had a quick look at it.  It looks quite thorough.
>
> I think I see that you [Richard] are implying that the if converter
> at the GIMPLE level should not be trying to do all the if-conversion
> work that could possibly be done.  I agree with that.  However,
> AFAIK the RTL work is done strictly after the autovectorization,
> so any if conversion that is strictly for the benefit of
> autovectorization must be done before autovectorization
> and therefor at the GIMPLE level.  Corrections are welcome.

Yes, the GIMPLE if-converter should strictly be a vectorization enabler.
If the "new" if-converter produces code that the vectorizer does not
handle (on the target targeted) then it has done a bad job.

>
> [Abe wrote:]
>>>
>>> The preceding makes me wonder: has anybody considered adding an
>>> optimization
>>> profile for GCC, […] that optimizes for the amount of energy consumed?
>
>>> I don`t remember reading about anything like that […]
>
> [Richard wrote:]
>>
>> I think there were GCC summit papers/talks about this.
>
>
> Thanks, but can you please be more specific?
>
> After writing the message quotes above as "[Abe wrote:]"
> I found 2 or 3 papers about compiling code with an eye
> towards energy efficiency, but not a whole hell of a lot,
> and I didn`t yet find anything GCC-specific on this topic.
>
>
> [Abe wrote:]
>>>
>>> The old one can, in some cases, produce code that
>>> e.g. dereferences a null pointer when the same
>>> program given the same inputs would have not
>
>>> done so without the if-conversion "optimization".
>
> [Richard wrote:]
>>
>> Testcase?  I don't think it can and if it can this bug needs to be fixed.
>
>
> With the program below my sign-off, using stock GCC 4.9.2, even under "-O3"
> the code compiles and runs OK, even when using "-ftree-loop-if-convert",
> which is I guess what you [Richard] meant by a recent comment to me [Abe].
> Sebastian confirmed in person that even the old if converter did things
> differently
> even to loads when GCC is invoked with the full
> "-ftree-loop-if-convert-stores"
> flag but without "-ftree-loop-if-convert-stores".  With the old converter,
> compiling with "-ftree-loop-if-convert-stores" yields a program that
> segfaults due
> to dereferencing a null pointer [it would deref. a _lot_ of them if it
> _could_ ;-)].
>
> [tested using GCC 4.9.2 on both Cygwin (64-bit) for Windows 7, AMD64
> compilation,
>  and Mac OS X 10.6.8 -- also using GCC 4.9.2 -- compiling for both ia32 and
> AMD64]
>
> I intend to adapt the test case to DejaGNU format and add
> it to the codebase from which the patch is being generated.

I think I fixed this bug.  It was because of an inverted test.

2015-07-10  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/66823
        * tree-if-conv.c (memrefs_read_or_written_unconditionally): Fix
        inverted predicate.

>
> [Abe wrote:]
>>>
>>> The new converter reduces/eliminates this problem.
>
>
> [Richard wrote:]
>>
>> You mean the -ftree-loop-if-convert-stores path.
>
>
> The old converter apparently only produced code with the aforementioned
> crashing problem only when "-ftree-loop-if-convert-stores" is/was in use,
> yes.

Due to a bug.  It was never supposed to do invalid transforms.  It was
known to produce store data-races with -ftree-loop-if-convert-stores.
That is something the scratchpad use avoids (at the expense of
a lot(?) of vectorization).

> The new one should not be producing code with that problem
> regardless of "-ftree-loop-if-convert-stores" or lack thereof.
>
> I think the reason for the confusing ambiguity is that since the old
> converter
> did conversion of stores in a way that was thread-unsafe for half hammocks
> [e.g. C source code like "if (condition)  A[a] = something;" with no
> attached
> "else", assuming "condition" and "something" are both conversion-friendly],
> somebody used "-ftree-loop-if-convert-stores" to mean "if-convert as much
> as possible even if doing so requires converting unsafely".  In the new
> converter we have no such unsafety TTBOMK, so I propose to remove that flag.

First of all to fix regressions on the load if-conversion side you should stop
using a scratchpad for loads (do you have one?  I seem to remember you
do from the design description).  load data-races are only theoretical.

Then you should IMHO keep the old store path for --param
allow-store-data-races=1.

I'd like to see a comparison in the number of vectorized loops for
SPEC CPU 2006 FP
with -Ofast -ftree-loop-if-convert-stores with / without the new if-converter
(and if you like also without -ftree-loop-if-convert-stores).  You can use
-fopt-info-vec and grep for the number of 'loop vectorized' cases.

Thanks,
Richard.

> Regards,
>
> Abe
>
>
>
>
>
> Makefile
> ========
> all: foo_______if-converted foo_______if-converted_with_stores_flag
> foo___NOT_if-converted
>
> foo_______if-converted_with_stores_flag: foo.c
>         gcc -std=c99 -O3 -ftree-loop-if-convert-stores foo.c -o
> foo_______if-converted_with_stores_flag
>
> foo_______if-converted: foo.c
>         gcc -std=c99 -O3 -ftree-loop-if-convert        foo.c -o
> foo_______if-converted
>
> foo___NOT_if-converted: foo.c
>         gcc -std=c99 -O3                               foo.c -o
> foo___NOT_if-converted
>
>
>
>
> foo.c
> =====
> /* intentionally not defining "SIZE" here:
>      pretending that "choose" is compiled separately from "main"
> */
>
> /* a "controlled copy-paste" that takes 4 arrays, 2 of which are full of
> pointers,
>    and for each index deref.s one of the 2 pointers and shoves the result in
> one
>    of the arrays, based on the value in the respective index of the
> remaining array
> */
>
> void __attribute__((noinline)) /* forcing no inlining b/c inlining
> theoretically allow
>                                   sufficient analysis to allow the optimizer
> to "see"
>                                   that "cond_array[...]" is full of nothing
> but 0s
>                                */
>   choose(char*       cond_array,
>          short ** pointer_array_1,
>          short ** pointer_array_2,
>          short *   output_array,
>          unsigned long long len) {
>
>   for (unsigned long long index = 0; index < len; ++index)
>     if (cond_array[index])
>       output_array[index] = *pointer_array_1[index];
>     else
>       output_array[index] = *pointer_array_2[index];
>
> }
>
>
>
> #define SIZE 9
>
> int main() {
>
>   char           condition[SIZE];
>   short           data_in [SIZE];
>   short           data_out[SIZE];
>   short *  pointer_array_1[SIZE];
>   short *  pointer_array_2[SIZE];
>
>   for (unsigned short index = 0; index < SIZE; ++index) {
>     condition      [index]  = 0;              /*** all false         ***/
>     pointer_array_1[index]  = 0;              /*** all null pointers ***/
>     pointer_array_2[index] = &data_in[index]; /*** all good pointers ***/
>   }
>
>   choose(&condition[0], &pointer_array_1[0], &pointer_array_2[0],
> &data_out[0], SIZE);
>
> }
>
>
>
>
>
>
>
> result from running MacOSX/ia32 compilation under GDB 6.3.50*
> =============================================================
> Program received signal EXC_BAD_ACCESS, Could not access memory.
> Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
> 0x00001e2b in choose ()
>
>
>
> result from running MacOSX/AMD64 compilation under GDB 6.3.50*
> ==============================================================
> Program received signal EXC_BAD_ACCESS, Could not access memory.
> Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
> 0x0000000100000e90 in choose ()
>
>
>
> * Apple-supplied version of GDB, identified as:
>
>     GNU gdb 6.3.50-20050815 (Apple version gdb-1515) (Sat Jan 15 08:33:48
> UTC 2011)

Reply via email to