Xinliang David Li <davi...@google.com> wrote: >On Mon, Aug 19, 2013 at 11:53 AM, Richard Biener ><richard.guent...@gmail.com> wrote: >> Xinliang David Li <davi...@google.com> wrote: >>>+cc auto-vectorizer maintainers. >>> >>>David >>> >>>On Mon, Aug 19, 2013 at 10:37 AM, Cong Hou <co...@google.com> wrote: >>>> Nowadays, SIMD instructions play more and more important roles in >our >>>> daily computations. AVX and AVX2 have extended 128-bit registers to >>>> 256-bit ones, and the newly announced AVX-512 further doubles the >>>> size. The benefit we can get from vectorization will be larger and >>>> larger. This is also a common practice in other compilers: >>>> >>>> 1) Intel's ICC turns on vectorizer at O2 by default and it has been >>>> the case for many years; >>>> >>>> 2) Most recently, LLVM turns it on for both O2 and Os. >>>> >>>> >>>> Here we propose moving vectorization from -O3 to -O2 in GCC. Three >>>> main concerns about this change are: 1. Does vectorization greatly >>>> increase the generated code size? 2. How much performance can be >>>> improved? 3. Does vectorization increase compile time >significantly? >>>> >>>> >>>> I have fixed GCC bootstrap failure with vectorizer turned on >>>> (http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00497.html). To >>>evaluate >>>> the size and performance impact, experiments on SPEC06 and internal >>>> benchmarks are done. Based on the data, I have tuned the parameters >>>> for vectorizer which reduces the code bloat without sacrificing the >>>> performance gain. There are some performance regressions in SPEC06, >>>> and the root cause has been analyzed and understood. I will file >bugs >>>> tracking them independently. The experiments failed on three >>>> benchmarks (please refer to >>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56993). The experiment >>>> result is attached here as two pdf files. Below are our summaries >of >>>> the result: >>>> >>>> >>>> 1) We noticed that vectorization could increase the generated code >>>> size, so we tried to suppress this problem by doing some tunings, >>>> which include setting a higher loop bound so that loops with small >>>> iterations won't be vectorized, and disabling loop versioning. The >>>> average size increase is decreased from 9.84% to 7.08% after our >>>> tunings (13.93% to 10.75% for Fortran benchmarks, and 3.55% to >1.44% >>>> for C/C++ benchmarks). The code size increase for Fortran >benchmarks >>>> can be significant (from 18.72% to 34.15%), but the performance >gain >>>> is also huge. Hence we think this size increase is reasonable. For >>>> C/C++ benchmarks, the size increase is very small (below 3% except >>>> 447.dealII). >>>> >>>> >>>> 2) Vectorization improves the performance for most benchmarks by >>>> around 2.5%-3% on average, and much more for Fortran benchmarks. On >>>> Sandybridge machines, the improvement can be more if using >>>> -march=corei7 (3.27% on average) and -march=corei7-avx (4.81% on >>>> average) (Please see the attachment for details). We also noticed >>>that >>>> some performance degrades exist, and after investigation, we found >>>> some are caused by the defects of GCC's vectorization (e.g. GCC's >SLP >>>> could not vectorize a group of accesses if the number of group >cannot >>>> be divided by VF http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955, >>>> and any data dependence between statements can prevent >>>vectorization), >>>> which can be resolved in the future. >>>> >>>> >>>> 3) As last, we found that introducing vectorization almost does not >>>> affect the build time. GCC bootstrap time increase is negligible. >>>> >>>> >>>> As a reference, Richard Biener is also proposing to move >>>vectorization >>>> to O2 by improving the cost model >>>> (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00904.html). >> >> And my conclusion is that we are not ready for this. The compile >time cost does not outweigh the benefit. > >Can you elaborate on your reasoning ?
I have done measurements with spec 2006 and selective turning on parts of the vectorizer at O2. vectorizing has both a compile-time (around 10%) and code-size (up to 15%) impact. at full feature set vectorization regresses runtime of quite a number of benchmarks significantly. At reduced feature set - basically trying to vectorize only obvious profitable cases - these regressions can be avoided but progressions only remain on two spec fp cases. As most user applications fall into the spec int category a 10% compile-time and 15% code-size regression for no gain is no good. Richard. >thanks, > >David > > >> >> Richard. >> >>>> >>>> Vectorization has great performance potential -- the more people >use >>>> it, the likely it will be further improved -- turning it on at O2 >is >>>> the way to go ... >>>> >>>> >>>> Thank you! >>>> >>>> >>>> Cong Hou >> >>