On Thu, Feb 7, 2013 at 4:26 PM, Vladimir Makarov <vmaka...@redhat.com> wrote: > I've add pages comparing LLVM-3.2 and coming GCC 4.8 on > http://vmakarov.fedorapeople.org/spec/. > > The pages are accessible by links named GCC-LLVM comparison, 2013, x86 and > x86-64 SPEC2000 under link named 2013. You can find these links at the > bottom of the left frame. > > If you prefer email for reading the comparison, here is the copy of page > accessible by link named 2013: > > > Comparison of GCC and LLVM in 2013. > > This year the comparison is done on coming *GCC 4.8* and *LLVM 3.2* > which was released at the very end of 2012. > > As usually I am focused mostly on the compiler comparison as > *optimizing* compilers on major platform x86/x86-64. I don't consider > other aspects of the compilers as quality of debug information > (especially in optimizations modes), supported languages, standards > and extensions (e.g. OMP), supported targets and ABI, support of > just-in-time compilation etc. > > This year I did the comparison using following major options > equivalent with my point of view: > > o *-O0 -g, -Os, -O1, -O2, -O3, -O4* for LLVM3.2 > o *-O0 -g, -Os, -O1, -O2, -O3, -Ofast -flto* for GCC4.8
On the web-page you say that you use -Ofast -fno-fast-math (because that is what LLVM does with -O4). For GCC that's equivalent to -O3 (well, apart from that you enable -flto). So you can as well say you tested -O3 -flto. For 32bit you used -mtune=corei7 -march=i686 - did you disable CPU features like SSE on purpose? Vectorization at -O3+ should have used those (though without -ffast-math FP vectorization is seriously restricted). It would be nice to see -O3 -ffast-math vs. whatever LLVM equivalent is available. Also note that for SPEC -funroll-loops helps GCC (yes ... we don't enable that by default at -O3, we probably should). I don't know whether LLVM with -O4 creates fat objects as we do (you can link them without -flto). If not, then for compile-time you should use -fno-fat-lto-objects. Does LLVM parallelize the LTO link stage? If so you should compare with -flto=jobserver or -flto=number-of-available-cores. If not you should compare with -flto-partition=none (that will save some I/O and processing time). As a general note - we don't pay much attention to SPEC 2000 performance these days but instead look at SPEC CPU 2006 ... Thanks for the comparison! Richard. > I tried to decrease the number of graphs which are still too many. > Therefore I removed data for -O0 -g and -Os from the graphs but still > I post some data about these modes below. If you need exact numbers > you should look at the tables from which the graphs were generated. > > I had to use -O0 for compilation of SPECInt2000 254.gap for both > compilers as LLVM3.2 can not generate correct code in any optimization > mode for this test. > > Here are my conclusions from analyzing the data: > > o LLVM made a regress in supported non-experimental languages which > makes a performance comparison much harder for me. Earlier LLVM was > able to use GCC frontends (although old ones) including Fortran > front-end. Now *CLANG* driver when it processes Fortran programs > just calls GCC Fortran compiler. So comparison of CLANG LLVM and > GCC on SPECFP2000 has no sense (it would be just a comparison of GCC > 4.8 and version of GCC standardly used on a given machine) although > you can find such comparisons on the Internet (e.g. on phoronix.com) > > Therefore I had to use *Dragonegg* (a GCC plugin which uses LLVM > backend instead of GCC backend) for generation of Fortran benchmarks > by LLVM. > > Although CLANG made LLVM less dependent on GCC, *still LLVM is > heavily dependent on GCC and more generally on other GNU projects* > (GOLD, binutils etc). Industrial compilers (including Intel > compilers, SUN studio compilers, OPEN64, Pathscale) usually support > triad of languages C, C++, and Fortran. It is a pretty big > investment to implement Fortran front-end especially with > language-dependent optimizations. > > > o The difference between LLVM and GCC on integer benchmarks is only > about 8% for -O3 and 3-4% for 32- and 64-bit peak performance (when > LTO is used by both compilers). On floating point benchmarks, the > difference is 3% and 9% for -O3 correspondingly for 32- and 64-bit > modes and 6% and 12% for the peak performance. > > To see a perspective, the performance difference between LLVM2.9 and > GCC4.7 reached 20% (on SPECFP2000 in 32- and 64-bit modes for -O3). > So *LLVM made a significant progress* with the performance point of > view since 2.9 version. > > I believe such progress is achieved mostly because of a *new RA* > introduced in LLVM 3.0 and *auto-vectorization*. By the way, > although new LLVM RA is much better than the old one, I think it is > a mistake that the new RA still does not use graph-coloring based RA > which has a potential to improve performance even more > > o In 2011, I used LLVM with GCC front-end and showed that a *common > opinion "LLVM is faster compiler than GCC" is a myth* when you > compare compilers in modes generating the same code quality. > > It is still close to true for LLVM with CLANG front-end. For > example, in case of 32-bit SPECInt2000 the code quality generated by > GCC4.8 in -O1 mode is 16% better than one generated by LLVM3.2 in > -O1 mode and 1% better than code generated by LLVM3.2 in -O2 mode, > but GCC compiler in -O1 mode is 2% and 10% faster than LLVM3.2 > correspondingly in -O1 and -O2 mode. It means that GCC -O1 is > closer to CLANG LLVM3.2 -O2 with the performance and compiler speed > point of view. > > Where GCC is really slower (2.5 times) than CLANG LLVM3.2 is in LTO > mode. > > o *GCC has better code size optimizations (-Os)*, GCC4.8 generates in > average 6-7% smaller code (text + data segments) of SPECInt2000 than > LLVM3.2. > > o In widely used debugging mode (-O0 -g), GCC4.8 is only about 5% > slower than LLVM3.2 but generates about 16% and 13% smaller and 18% > and 10% faster SPECInt2000 code correspondingly in 32-bit and 64-bit > mode. > > o Despite that LLVM supports many targets, LLVM is focused mostly on > developments two of them x86/x86-64 and ARM. I see two supporting > evidence for this thesis. > > One is that dragonegg supports only the two mention targets. You > even can not benchmark SPECFP for LLVM on other targets as you can > not use LLVM to compile Fortran programs. > > Another one is that the quality of code generated by LLVM > for other targets is not so good as one generated by GCC. For POWER > example (second most important server architecture), LLVM rate for > SPECInt2000 or part of SPECFP2000 (4 benchmarks on C) is about 20% > worse than for GCC. > > So I would not recommend switching to LLVM for any Linux > distribution because other targets are not refined as x86/x86-64 > with performance point of view (there are a lot of other aspects > besides generated code performance which make such switching > unreasonable). By the way, only LLVM-3.2 binaries for MACOS > provided by LLVM site are compiled by LLVM itself. For Linux and > FreeBSD (this project officially switched from GCC to LLVM because > of new version GNU license for GCC), the binaries are still compiled > by GCC (correspondingly by GCC 4.6 and GCC 4.2). > > Still I think that GCC community should pay more attention to > improving code quality for x86/x86-64 as LLVM is catching us up. >