On 2013.03.25 at 15:17 +0100, Richard Biener wrote: > On Mon, Mar 25, 2013 at 2:24 PM, Markus Trippelsdorf > <mar...@trippelsdorf.de> wrote: > > On 2013.03.25 at 14:11 +0100, Richard Biener wrote: > >> On Mon, Mar 25, 2013 at 1:56 PM, Markus Trippelsdorf > >> <mar...@trippelsdorf.de> wrote: > >> > On 2013.03.25 at 08:06 +0100, Markus Trippelsdorf wrote: > >> >> On 2013.03.24 at 20:53 +0100, gcc_mailingl...@abwesend.de wrote: > >> >> > > >> >> > is it useful to compile gcc 4.8.0 with the lto option? > >> >> > >> >> If you want a (slightly) faster compiler then yes. > >> >> Simply add "--with-build-config=bootstrap-lto" to your configuration. > >> >> You can combine this with profile feedback: "make profiledbootstrap". > >> > > >> > To qualify "(slightly) faster" in the statement above, I build gcc with > >> > four different configurations on my AMD64 4-core machine (vanilla, LTO > >> > only, PGO only, LTO+PGO). Then I measured how much time it takes to > >> > build the Linux kernel and Firefox. Here are the results: > >> > > >> > Firefox: > >> > vanilla: 5143.27s user 267.27s system 346% cpu 26:02.03 total > >> > PGO : 4590.37s user 270.21s system 344% cpu 23:28.89 total > >> > LTO : 5056.11s user 268.04s system 348% cpu 25:28.73 total > >> > LTO+PGO: 4598.79s user 269.01s system 347% cpu 23:22.13 total > >> > > >> > kernel (measured three times): > >> > vanilla: 382.34s user 23.74s system 334% cpu 2:01.41 total 382.08s user > >> > 24.05s system 333% cpu 2:01.93 total 385.20s user 23.63s system 330% cpu > >> > 2:03.73 total > >> > PGO : 341.18s user 23.25s system 323% cpu 1:52.71 total 341.72s user > >> > 23.66s system 323% cpu 1:52.93 total 340.32s user 23.42s system 326% cpu > >> > 1:51.38 total > >> > LTO : 381.23s user 23.55s system 328% cpu 2:03.05 total 380.41s user > >> > 24.35s system 328% cpu 2:03.24 total 379.47s user 23.98s system 331% cpu > >> > 2:01.82 total > >> > LTO+PGO: 347.12s user 25.11s system 317% cpu 1:57.34 total 344.38s user > >> > 24.05s system 326% cpu 1:52.99 total 344.74s user 24.61s system 323% cpu > >> > 1:54.03 total > >> > > >> > To summarize: > >> > * GCC build with PGO is ~10% faster than a vanilla bootstrapped > >> > compiler. > >> > * GCC build with LTO only is only ~2% faster when building Firefox. The > >> > kernel build time difference is in the noise. > >> > * A LTO+PGO build is almost exactly as fast as a pure PGO build. > >> > > >> > So it appears, contrary to the advice given above, that it is not useful > >> > to build gcc 4.8.0 with the lto option at the moment. > >> > >> Probably Honza did a too good job in making sure optimizations LTO does > >> can be done without LTO as well by fixing up GCC sources ;) > >> > >> Did you compare binary sizes of the compiler itself (w/o debuginfo)? > > > > Vanilla: > > -rwxr-xr-x 1 markus markus 16219976 Mar 25 09:28 cc1 > > -rwxr-xr-x 1 markus markus 17762824 Mar 25 09:28 cc1plus > > -rwxr-xr-x 1 markus markus 15354320 Mar 25 09:28 lto1 > > -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 c++ > > -rwxr-xr-x 1 markus markus 663496 Mar 25 09:28 cpp > > -rwxr-xr-x 4 markus markus 664920 Mar 25 09:28 g++ > > -rwxr-xr-x 3 markus markus 662464 Mar 25 09:28 gcc > > > > PGO: > > -rwxr-xr-x 1 markus markus 14778600 Mar 25 09:14 cc1 > > -rwxr-xr-x 1 markus markus 16106120 Mar 25 09:14 cc1plus > > -rwxr-xr-x 1 markus markus 14054448 Mar 25 09:14 lto1 > > -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 c++ > > -rwxr-xr-x 1 markus markus 575600 Mar 25 09:14 cpp > > -rwxr-xr-x 4 markus markus 579744 Mar 25 09:14 g++ > > -rwxr-xr-x 3 markus markus 575560 Mar 25 09:14 gcc > > > > LTO: > > -rwxr-xr-x 1 markus markus 17147688 Mar 25 08:58 cc1 > > -rwxr-xr-x 1 markus markus 18728200 Mar 25 08:58 cc1plus > > -rwxr-xr-x 1 markus markus 16227224 Mar 25 08:58 lto1 > > -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 c++ > > -rwxr-xr-x 1 markus markus 568224 Mar 25 08:58 cpp > > -rwxr-xr-x 4 markus markus 567968 Mar 25 08:58 g++ > > -rwxr-xr-x 3 markus markus 563728 Mar 25 08:58 gcc > > > > LTO+PGO: > > -rwxr-xr-x 1 root root 16319480 Mar 22 13:02 cc1 > > -rwxr-xr-x 1 root root 17616608 Mar 22 13:02 cc1plus > > -rwxr-xr-x 1 root root 15445824 Mar 22 13:02 lto1 > > -rwxr-xr-x 4 root root 492344 Mar 22 13:02 c++ > > -rwxr-xr-x 1 root root 492320 Mar 22 13:02 cpp > > -rwxr-xr-x 4 root root 492344 Mar 22 13:02 g++ > > -rwxr-xr-x 3 root root 492232 Mar 22 13:02 gcc > > Hmm, does the default --enable-plugin (GCC plugin support) which results > in -rdynamic being used maybe prevent some of the useful LTO optimizations > (mainly due to cost constraints)? That is, is a LTO + PGO build with > --disable-plugin any different?
Yes, the binary size is 8-10% smaller. Unfortunately there are no performance improvements. LTO+PGO-disable-plugin: -rwxr-xr-x 1 markus markus 15025568 Mar 25 15:49 cc1 -rwxr-xr-x 1 markus markus 16198584 Mar 25 15:49 cc1plus -rwxr-xr-x 1 markus markus 13907328 Mar 25 15:49 lto1 -rwxr-xr-x 4 markus markus 492360 Mar 25 15:49 c++ -rwxr-xr-x 1 markus markus 488240 Mar 25 15:49 cpp -rwxr-xr-x 3 markus markus 488216 Mar 25 15:49 gcc Firefox: LTO+PGO-disable-plugin: 4590.55s user 273.70s system 343% cpu 23:34.65 total kernel: LTO+PGO-disable-plugin: 344.11s user 23.59s system 322% cpu 1:54.08 total 340.94s user 23.65s system 326% cpu 1:51.56 total 339.66s user 23.41s system 329% cpu 1:50.09 total -- Markus