Actually, you need another patch to make this work: Index: gcc/tree-vect-loop-manip.c =================================================================== --- gcc/tree-vect-loop-manip.c (revision 199416) +++ gcc/tree-vect-loop-manip.c (working copy) @@ -855,7 +855,6 @@ /* All loops have an outer scope; the only case loop->outer is NULL is for the function itself. */ || !loop_outer (loop) - || loop->num_nodes != 2 || !empty_block_p (loop->latch) || !single_exit (loop) /* Verify that new loop exit condition can be trivially modified. */
Dehao On Thu, May 30, 2013 at 12:03 PM, Toon Moene <t...@moene.org> wrote: > On 05/30/2013 02:46 AM, Dehao Chen wrote: > >> In tree-vect-loop.c, it limits the vectorization only to loops that have 2 >> BBs: >> >> /* Inner-most loop. We currently require that the number of BBs is >> exactly 2 (the header and latch). Vectorizable inner-most loops >> look like this: >> >> (pre-header) >> | >> header<--------+ >> | | | >> | +--> latch --+ >> | >> (exit-bb) */ >> >> if (loop->num_nodes != 2) >> { >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> "not vectorized: control flow in loop."); >> return NULL; >> } > > >> Any insights why the limit is set to 2? We found that removing this >> limit actually improve performance for many applications. > > > It might have been just "safety first" - we know how to do single basic > block inner loops, let's stick with them for the moment (this development > was started around a decade ago). > > Our 3.5 million lines of Fortran 90 code (mostly array expressions) and > 125,000 lines of arbitrary C code is currently normally compiled with: > > $ gfortran -v > Using built-in specs. > COLLECT_GCC=gfortran > COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper > Target: x86_64-linux-gnu > Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.3-4' > --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs > --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr > --program-suffix=-4.7 --enable-shared --enable-linker-build-id > --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix > --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls > --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug > --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin > --with-system-zlib --enable-objc-gc --with-cloog --enable-cloog-backend=ppl > --disable-cloog-version-check --disable-ppl-version-check --enable-multiarch > --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 > --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu > --host=x86_64-linux-gnu --target=x86_64-linux-gnu > Thread model: posix > gcc version 4.7.3 (Debian 4.7.3-4) > > So I tried it with: > > $ /usr/snp/bin/gfortran -v > Using built-in specs. > COLLECT_GCC=/usr/snp/bin/gfortran > COLLECT_LTO_WRAPPER=/usr/snp/libexec/gcc/x86_64-unknown-linux-gnu/4.7.4/lto-wrapper > Target: x86_64-unknown-linux-gnu > Configured with: ../gcc-4_7-branch/configure --prefix=/usr/snp --with-gnu-as > --with-gnu-ld --enable-languages=fortran --disable-libmudflap > --disable-multilib --disable-nls --with-arch=native --with-tune=native > Thread model: posix > gcc version 4.7.4 20130530 (prerelease) (GCC) > > augmented by this single change: > > toon@super:~/compilers/gcc-4_7-branch/gcc$ svn diff > Index: tree-vect-loop.c > =================================================================== > --- tree-vect-loop.c (revision 199454) > +++ tree-vect-loop.c (working copy) > @@ -1002,6 +1002,8 @@ > | > (exit-bb) */ > > + /* Disabled check > > + > if (loop->num_nodes != 2) > { > if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS)) > @@ -1009,6 +1011,8 @@ > return NULL; > } > > + */ > + > if (empty_block_p (loop->header)) > { > if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS)) > > Amazingly enough, I didn't hit *any* ICE. Also, running the generated > executables produced reasonable results (you have to trust me that it is > *very hard* to fake correct meteorological results if you blow up the > generated code). > > Unfortunately, the relative importance of conditional code in inner loops is > not sufficient to show any speedup on our code. > > Nevertheless, it would be a huge improvement on *other* codes if we could > lift this restriction. > > -- > Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ > Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news