Steven Bosscher wrote:
Hello,

For some time now, I've wanted to see where compile time goes in a
typical GCC build, because nobody really seems to know what the
compiler spends its time on. The impressions that get published about
gcc usually indicate that there is at least a feeling that GCC is not
getting faster, and that parts of the compiler are unreasonably slow.
It is just a feeling. In fact, starting since 4.2, gcc becomes faster (if we ignore LTO). My feeling is that LLVM becomes slower. The gap in compilation speed between GCC4.5 and LLVM2.7 achieves less 10% on x86_64 SPECInt2000 for -O2.

Feeling that GCC becomes slower probably occurs for people who switched recently from 3.x versions because in comparison with theses versions gcc became slower achieving maximum slowdown 25 and 40% slowdown on gcc4.2 correspondingly on SPECInt2000 and SPECFP2000 on x86_64 for -O2.

All these GCC version comparison can be found on http://vmakarov.fedorapeople.org/spec/.

As I wrote, GCC is not such slower compiler. People sometime exaggerate this problem. For example, x86_64 Sun compiler generating about the same quality code as GCC is 2 times slower than GCC.

We should work on speeding GCC up but the first priority (at least for me) should be improvement of generated code.
Host was cfarm gcc14 (8 x 3GHz Xeon). Target was
x86_64-unknown-linux-gnu. "Build" means non-bootstrap.

Results at the bottom of this mail.

Conclusions:


* The "slow" parts of the compiler are not exactly news: tree-PRE,
scheduling, register allocation

RA and scheduling is usually the slowest part of optimizing compiler because they solve NP-problems and widely used algorithms (list scheduling and graph coloring) has worst quadratic complexity. For example, here is comparison of how many time LLVM-2.7 passes and analogous GCC passes (although sometime it is hard to find full correspondence) spent on

                   LTO                                            GCC4.6
RA (linear scan RA + simple register coalescing) 7.2% IRA 9% Instruction Selection 10.7% combiner+reload 9%

The data are from compilation all_cp2k_gfortran.f90 (420Kline fortran with hundreds functions) on x86_64 in -O3 mode. By the way on Core2 GCC4.6 spent 235 user sec on compilation of this file vs 265 sec by LLVM.

Also linear scan RA is one of the fastest RA algorithm but it is much worse (at least 2-3% on SPEC2000) than graph coloring one.

I wanted to look at the same time distribution for OPEN64 because it has sophisticated graph coloring RA algorithm. May be I'll do it when I learn OPEN64 more.
* Adding and subtracting the above numbers, the rest of the compiler,
which is mostly the RTL parts, still account for 100-17-16-8=59% of
the total compile time. This was the most surprising result for me.

I don't know is it big or not to have such time spend in RTL parts. But I think that this RTL part could be decreased if RTL (magically :) would have smaller footprint and contain less details.
Ciao!
Steven

Thanks for the data. That was interesting to look one more time (from many others) at this.

Reply via email to