On Mon, Aug 19, 2013 at 11:53 AM, Richard Biener <richard.guent...@gmail.com> wrote: > Xinliang David Li <davi...@google.com> wrote: >>+cc auto-vectorizer maintainers. >> >>David >> >>On Mon, Aug 19, 2013 at 10:37 AM, Cong Hou <co...@google.com> wrote: >>> Nowadays, SIMD instructions play more and more important roles in our >>> daily computations. AVX and AVX2 have extended 128-bit registers to >>> 256-bit ones, and the newly announced AVX-512 further doubles the >>> size. The benefit we can get from vectorization will be larger and >>> larger. This is also a common practice in other compilers: >>> >>> 1) Intel's ICC turns on vectorizer at O2 by default and it has been >>> the case for many years; >>> >>> 2) Most recently, LLVM turns it on for both O2 and Os. >>> >>> >>> Here we propose moving vectorization from -O3 to -O2 in GCC. Three >>> main concerns about this change are: 1. Does vectorization greatly >>> increase the generated code size? 2. How much performance can be >>> improved? 3. Does vectorization increase compile time significantly? >>> >>> >>> I have fixed GCC bootstrap failure with vectorizer turned on >>> (http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00497.html). To >>evaluate >>> the size and performance impact, experiments on SPEC06 and internal >>> benchmarks are done. Based on the data, I have tuned the parameters >>> for vectorizer which reduces the code bloat without sacrificing the >>> performance gain. There are some performance regressions in SPEC06, >>> and the root cause has been analyzed and understood. I will file bugs >>> tracking them independently. The experiments failed on three >>> benchmarks (please refer to >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56993). The experiment >>> result is attached here as two pdf files. Below are our summaries of >>> the result: >>> >>> >>> 1) We noticed that vectorization could increase the generated code >>> size, so we tried to suppress this problem by doing some tunings, >>> which include setting a higher loop bound so that loops with small >>> iterations won't be vectorized, and disabling loop versioning. The >>> average size increase is decreased from 9.84% to 7.08% after our >>> tunings (13.93% to 10.75% for Fortran benchmarks, and 3.55% to 1.44% >>> for C/C++ benchmarks). The code size increase for Fortran benchmarks >>> can be significant (from 18.72% to 34.15%), but the performance gain >>> is also huge. Hence we think this size increase is reasonable. For >>> C/C++ benchmarks, the size increase is very small (below 3% except >>> 447.dealII). >>> >>> >>> 2) Vectorization improves the performance for most benchmarks by >>> around 2.5%-3% on average, and much more for Fortran benchmarks. On >>> Sandybridge machines, the improvement can be more if using >>> -march=corei7 (3.27% on average) and -march=corei7-avx (4.81% on >>> average) (Please see the attachment for details). We also noticed >>that >>> some performance degrades exist, and after investigation, we found >>> some are caused by the defects of GCC's vectorization (e.g. GCC's SLP >>> could not vectorize a group of accesses if the number of group cannot >>> be divided by VF http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955, >>> and any data dependence between statements can prevent >>vectorization), >>> which can be resolved in the future. >>> >>> >>> 3) As last, we found that introducing vectorization almost does not >>> affect the build time. GCC bootstrap time increase is negligible. >>> >>> >>> As a reference, Richard Biener is also proposing to move >>vectorization >>> to O2 by improving the cost model >>> (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00904.html). > > And my conclusion is that we are not ready for this. The compile time cost > does not outweigh the benefit.
Can you elaborate on your reasoning ? thanks, David > > Richard. > >>> >>> Vectorization has great performance potential -- the more people use >>> it, the likely it will be further improved -- turning it on at O2 is >>> the way to go ... >>> >>> >>> Thank you! >>> >>> >>> Cong Hou > >