Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

Richard Biener Thu, 07 Feb 2013 08:09:39 -0800

On Thu, Feb 7, 2013 at 4:26 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:
> I've add pages comparing LLVM-3.2 and coming GCC 4.8 on
> http://vmakarov.fedorapeople.org/spec/.
>
> The pages are accessible by links named GCC-LLVM comparison, 2013, x86 and
> x86-64 SPEC2000 under link named 2013. You can find these links at the
> bottom of the left frame.
>
> If you prefer email for reading the comparison, here is the copy of page
> accessible by link named 2013:
>
>
> Comparison of GCC and LLVM in 2013.
>
> This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2*
> which was released at the very end of 2012.
>
> As usually I am focused mostly on the compiler comparison as
> *optimizing* compilers on major platform x86/x86-64.  I don't consider
> other aspects of the compilers as quality of debug information
> (especially in optimizations modes), supported languages, standards
> and extensions (e.g. OMP), supported targets and ABI, support of
> just-in-time compilation etc.
>
> This year I did the comparison using following major options
> equivalent with my point of view:
>
> o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2
> o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8


On the web-page you say that you use -Ofast -fno-fast-math (because
that is what LLVM does with -O4).  For GCC that's equivalent to -O3
(well, apart from that you enable -flto).  So you can as well say you
tested -O3 -flto.

For 32bit you used -mtune=corei7 -march=i686 - did you disable
CPU features like SSE on purpose?  Vectorization at -O3+ should
have used those (though without -ffast-math FP vectorization is
seriously restricted).

It would be nice to see -O3 -ffast-math vs. whatever LLVM equivalent
is available.

Also note that for SPEC -funroll-loops helps GCC (yes ... we don't
enable that by default at -O3, we probably should).

I don't know whether LLVM with -O4 creates fat objects as we do
(you can link them without -flto).  If not, then for compile-time
you should use -fno-fat-lto-objects.  Does LLVM parallelize the
LTO link stage?  If so you should compare with -flto=jobserver
or -flto=number-of-available-cores.  If not you should compare with
-flto-partition=none (that will save some I/O and processing time).

As a general note - we don't pay much attention to SPEC 2000
performance these days but instead look at SPEC CPU 2006 ...

Thanks for the comparison!
Richard.

> I tried to decrease the number of graphs which are still too many.
> Therefore I removed data for -O0 -g and -Os from the graphs but still
> I post some data about these modes below.  If you need exact numbers
> you should look at the tables from which the graphs were generated.
>
> I had to use -O0 for compilation of SPECInt2000 254.gap for both
> compilers as LLVM3.2 can not generate correct code in any optimization
> mode for this test.
>
> Here are my conclusions from analyzing the data:
>
> o LLVM made a regress in supported non-experimental languages which
>   makes a performance comparison much harder for me.  Earlier LLVM was
>   able to use GCC frontends (although old ones) including Fortran
>   front-end.  Now *CLANG* driver when it processes Fortran programs
>   just calls GCC Fortran compiler.  So comparison of CLANG LLVM and
>   GCC on SPECFP2000 has no sense (it would be just a comparison of GCC
>   4.8 and version of GCC standardly used on a given machine) although
>   you can find such comparisons on the Internet (e.g. on phoronix.com)
>
>   Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM
>   backend instead of GCC backend) for generation of Fortran benchmarks
>   by LLVM.
>
>   Although CLANG made LLVM less dependent on GCC, *still LLVM is
>   heavily dependent on GCC and more generally on other GNU projects*
>   (GOLD, binutils etc).  Industrial compilers (including Intel
>   compilers, SUN studio compilers, OPEN64, Pathscale) usually support
>   triad of languages C, C++, and Fortran.  It is a pretty big
>   investment to implement Fortran front-end especially with
>   language-dependent optimizations.
>
>
> o The difference between LLVM and GCC on integer benchmarks is only
>   about 8% for -O3 and 3-4% for 32- and 64-bit peak performance (when
>   LTO is used by both compilers).  On floating point benchmarks, the
>   difference is 3% and 9% for -O3 correspondingly for 32- and 64-bit
>   modes and 6% and 12% for the peak performance.
>
>   To see a perspective, the performance difference between LLVM2.9 and
>   GCC4.7 reached 20% (on SPECFP2000 in 32- and 64-bit modes for -O3).
>   So *LLVM made a significant progress* with the performance point of
>   view since 2.9 version.
>
>   I believe such progress is achieved mostly because of a *new RA*
>   introduced in LLVM 3.0 and *auto-vectorization*.  By the way,
>   although new LLVM RA is much better than the old one, I think it is
>   a mistake that the new RA still does not use graph-coloring based RA
>   which has a potential to improve performance even more
>
> o In 2011, I used LLVM with GCC front-end and showed that a *common
>   opinion "LLVM is faster compiler than GCC" is a myth* when you
>   compare compilers in modes generating the same code quality.
>
>   It is still close to true for LLVM with CLANG front-end.  For
>   example, in case of 32-bit SPECInt2000 the code quality generated by
>   GCC4.8 in -O1 mode is 16% better than one generated by LLVM3.2 in
>   -O1 mode and 1% better than code generated by LLVM3.2 in -O2 mode,
>   but GCC compiler in -O1 mode is 2% and 10% faster than LLVM3.2
>   correspondingly in -O1 and -O2 mode.  It means that GCC -O1 is
>   closer to CLANG LLVM3.2 -O2 with the performance and compiler speed
>   point of view.
>
>   Where GCC is really slower (2.5 times) than CLANG LLVM3.2 is in LTO
>   mode.
>
> o *GCC has better code size optimizations (-Os)*, GCC4.8 generates in
>   average 6-7% smaller code (text + data segments) of SPECInt2000 than
>   LLVM3.2.
>
> o In widely used debugging mode (-O0 -g), GCC4.8 is only about 5%
>   slower than LLVM3.2 but generates about 16% and 13% smaller and 18%
>   and 10% faster SPECInt2000 code correspondingly in 32-bit and 64-bit
>   mode.
>
> o Despite that LLVM supports many targets, LLVM is focused mostly on
>   developments two of them x86/x86-64 and ARM.  I see two supporting
>   evidence for this thesis.
>
>   One is that dragonegg supports only the two mention targets.  You
>   even can not benchmark SPECFP for LLVM on other targets as you can
>   not use LLVM to compile Fortran programs.
>
>   Another one is that the quality of code generated by LLVM
>   for other targets is not so good as one generated by GCC.  For POWER
>   example (second most important server architecture), LLVM rate for
>   SPECInt2000 or part of SPECFP2000 (4 benchmarks on C) is about 20%
>   worse than for GCC.
>
>   So I would not recommend switching to LLVM for any Linux
>   distribution because other targets are not refined as x86/x86-64
>   with performance point of view (there are a lot of other aspects
>   besides generated code performance which make such switching
>   unreasonable).  By the way, only LLVM-3.2 binaries for MACOS
>   provided by LLVM site are compiled by LLVM itself.  For Linux and
>   FreeBSD (this project officially switched from GCC to LLVM because
>   of new version GNU license for GCC), the binaries are still compiled
>   by GCC (correspondingly by GCC 4.6 and GCC 4.2).
>
>   Still I think that GCC community should pay more attention to
>   improving code quality for x86/x86-64 as LLVM is catching us up.
>

Re: SPEC2000 comparison of LLVM-3.2 and coming GCC4.8 on x86/x86-64

Reply via email to