Re: Propose moving vectorization from -O3 to -O2.

Xinliang David Li Mon, 19 Aug 2013 11:58:39 -0700

On Mon, Aug 19, 2013 at 11:53 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> Xinliang David Li <davi...@google.com> wrote:
>>+cc auto-vectorizer maintainers.
>>
>>David
>>
>>On Mon, Aug 19, 2013 at 10:37 AM, Cong Hou <co...@google.com> wrote:
>>> Nowadays, SIMD instructions play more and more important roles in our
>>> daily computations. AVX and AVX2 have extended 128-bit registers to
>>> 256-bit ones, and the newly announced AVX-512 further doubles the
>>> size. The benefit we can get from vectorization will be larger and
>>> larger. This is also a common practice in other compilers:
>>>
>>> 1) Intel's ICC turns on vectorizer at O2 by default and it has been
>>> the case for many years;
>>>
>>> 2) Most recently, LLVM turns it on for both O2 and Os.
>>>
>>>
>>> Here we propose moving vectorization from -O3 to -O2 in GCC. Three
>>> main concerns about this change are: 1. Does vectorization greatly
>>> increase the generated code size? 2. How much performance can be
>>> improved? 3. Does vectorization increase  compile time significantly?
>>>
>>>
>>> I have fixed GCC bootstrap failure with vectorizer turned on
>>> (http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00497.html). To
>>evaluate
>>> the size and performance impact, experiments on SPEC06 and internal
>>> benchmarks are done. Based on the data, I have tuned the parameters
>>> for vectorizer which reduces the code bloat without sacrificing the
>>> performance gain. There are some performance regressions in SPEC06,
>>> and the root cause has been analyzed and understood. I will file bugs
>>> tracking them independently. The experiments failed on three
>>> benchmarks (please refer to
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56993). The experiment
>>> result is attached here as two pdf files. Below are our summaries of
>>> the result:
>>>
>>>
>>> 1) We noticed that vectorization could increase the generated code
>>> size, so we tried to suppress this problem by doing some tunings,
>>> which include setting a higher loop bound so that loops with small
>>> iterations won't be vectorized, and disabling loop versioning. The
>>> average size increase is decreased from 9.84% to 7.08% after our
>>> tunings (13.93% to 10.75% for Fortran benchmarks, and 3.55% to 1.44%
>>> for C/C++ benchmarks). The code size increase for Fortran benchmarks
>>> can be significant (from 18.72% to 34.15%), but the performance gain
>>> is also huge. Hence we think this size increase is reasonable. For
>>> C/C++ benchmarks, the size increase is very small (below 3% except
>>> 447.dealII).
>>>
>>>
>>> 2) Vectorization improves the performance for most benchmarks by
>>> around 2.5%-3% on average, and much more for Fortran benchmarks. On
>>> Sandybridge machines, the improvement can be more if using
>>> -march=corei7 (3.27% on average) and -march=corei7-avx (4.81% on
>>> average) (Please see the attachment for details). We also noticed
>>that
>>> some performance degrades exist, and after investigation, we found
>>> some are caused by the defects of GCC's vectorization (e.g. GCC's SLP
>>> could not vectorize a group of accesses if the number of group cannot
>>> be divided by VF http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49955,
>>> and any data dependence between statements can prevent
>>vectorization),
>>> which can be resolved in the future.
>>>
>>>
>>> 3) As last, we found that introducing vectorization almost does not
>>> affect the build time. GCC bootstrap time increase is negligible.
>>>
>>>
>>> As a reference, Richard Biener is also proposing to move
>>vectorization
>>> to O2 by improving the cost model
>>> (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00904.html).
>
> And my conclusion is that we are not ready for this.  The compile time cost 
> does not outweigh the benefit.


Can you elaborate on your reasoning ?

thanks,

David


>
> Richard.
>
>>>
>>> Vectorization has great performance potential -- the more people use
>>> it, the likely it will be further improved -- turning it on at O2 is
>>> the way to go ...
>>>
>>>
>>> Thank you!
>>>
>>>
>>> Cong Hou
>
>

Re: Propose moving vectorization from -O3 to -O2.

Reply via email to