Re: Propose moving vectorization from -O3 to -O2.

Richard Biener Tue, 20 Aug 2013 03:58:13 -0700

Xinliang David Li <davi...@google.com> wrote:
>On Mon, Aug 19, 2013 at 11:53 AM, Richard Biener
><richard.guent...@gmail.com> wrote:
>> Xinliang David Li <davi...@google.com> wrote:
>>>+cc auto-vectorizer maintainers.
>>>
>>>David
>>>
>>>On Mon, Aug 19, 2013 at 10:37 AM, Cong Hou <co...@google.com> wrote:
>>>> Nowadays, SIMD instructions play more and more important roles in
>our
>>>> daily computations. AVX and AVX2 have extended 128-bit registers to
>>>> 256-bit ones, and the newly announced AVX-512 further doubles the
>>>> size. The benefit we can get from vectorization will be larger and
>>>> larger. This is also a common practice in other compilers:
>>>>
>>>> 1) Intel's ICC turns on vectorizer at O2 by default and it has been
>>>> the case for many years;
>>>>
>>>> 2) Most recently, LLVM turns it on for both O2 and Os.
>>>>
>>>>
>>>> Here we propose moving vectorization from -O3 to -O2 in GCC. Three
>>>> main concerns about this change are: 1. Does vectorization greatly
>>>> increase the generated code size? 2. How much performance can be
>>>> improved? 3. Does vectorization increase  compile time
>significantly?
>>>>
>>>>
>>>> I have fixed GCC bootstrap failure with vectorizer turned on
>>>> (http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00497.html). To
>>>evaluate
>>>> the size and performance impact, experiments on SPEC06 and internal
>>>> benchmarks are done. Based on the data, I have tuned the parameters
>>>> for vectorizer which reduces the code bloat without sacrificing the
>>>> performance gain. There are some performance regressions in SPEC06,
>>>> and the root cause has been analyzed and understood. I will file
>bugs
>>>> tracking them independently. The experiments failed on three
>>>> benchmarks (please refer to
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56993). The experiment
>>>> result is attached here as two pdf files. Below are our summaries
>of
>>>> the result:
>>>>
>>>>
>>>> 1) We noticed that vectorization could increase the generated code
>>>> size, so we tried to suppress this problem by doing some tunings,
>>>> which include setting a higher loop bound so that loops with small
>>>> iterations won't be vectorized, and disabling loop versioning. The
>>>> average size increase is decreased from 9.84% to 7.08% after our
>>>> tunings (13.93% to 10.75% for Fortran benchmarks, and 3.55% to
>1.44%
>>>> for C/C++ benchmarks). The code size increase for Fortran
>benchmarks
>>>> can be significant (from 18.72% to 34.15%), but the performance
>gain
>>>> is also huge. Hence we think this size increase is reasonable. For
>>>> C/C++ benchmarks, the size increase is very small (below 3% except
>>>> 447.dealII).
>>>>
>>>>
>>>> 2) Vectorization improves the performance for most benchmarks by
>>>> around 2.5%-3% on average, and much more for Fortran benchmarks. On
>>>> Sandybridge machines, the improvement can be more if using
>>>> -march=corei7 (3.27% on average) and -march=corei7-avx (4.81% on
>>>> average) (Please see the attachment for details). We also noticed
>>>that
>>>> some performance degrades exist, and after investigation, we found
>>>> some are caused by the defects of GCC's vectorization (e.g. GCC's
>SLP
>>>> could not vectorize a group of accesses if the number of group
>cannot
>>>> be divided by VF http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955,
>>>> and any data dependence between statements can prevent
>>>vectorization),
>>>> which can be resolved in the future.
>>>>
>>>>
>>>> 3) As last, we found that introducing vectorization almost does not
>>>> affect the build time. GCC bootstrap time increase is negligible.
>>>>
>>>>
>>>> As a reference, Richard Biener is also proposing to move
>>>vectorization
>>>> to O2 by improving the cost model
>>>> (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00904.html).
>>
>> And my conclusion is that we are not ready for this.  The compile
>time cost does not outweigh the benefit.
>
>Can you elaborate on your reasoning ?


I have done measurements with spec 2006 and selective turning on parts of the 
vectorizer at O2.  vectorizing has both a compile-time (around 10%) and 
code-size (up to 15%) impact.  at full feature set vectorization regresses 
runtime of quite a number of benchmarks significantly. At reduced feature set - 
basically trying to vectorize only obvious profitable cases - these regressions 
can be avoided but progressions only remain on two spec fp cases. As most user 
applications fall into the spec int category a 10% compile-time and 15% 
code-size regression for no gain is no good.

Richard.

>thanks,
>
>David
>
>
>>
>> Richard.
>>
>>>>
>>>> Vectorization has great performance potential -- the more people
>use
>>>> it, the likely it will be further improved -- turning it on at O2
>is
>>>> the way to go ...
>>>>
>>>>
>>>> Thank you!
>>>>
>>>>
>>>> Cong Hou
>>
>>

Re: Propose moving vectorization from -O3 to -O2.

Reply via email to