Some text size measurement. Summary: 1) LTO with -O3 bloats up code considerably; 2) LTO with -O2 reduces text size compared with -O2 3) Google 4.4.3 based compiler is really effective in reducing C++ program size -- this is where the focus of the tuning was done. Witnessed by eon in SPEC2k and all C++ apps in SPEC06
Notes: 1. -ffunction-sections -Wl,-gc-sections are used in the build. 2. SPEC06 dealII does not build with trunk GCC with some parsing error. Hj Lu, what alt source should be used? (it builds fine with 4.4.3 compiler) 3. xalancbmk and omnetpp do not build with TOT gcc compiler using FDO -- compiler ICEes. Will investigate when there is time. David SPEC06 C++ program Data (the first data column is the TOT O2 base number) 1. TOT O3 vs TOT O2 ( 3.35% total increase) 471.omnetpp/ 853708 867988 1.67% 450.soplex/ 643273 656349 2.03% 483.xalancbmk/ 3634416 3777600 3.94% 444.namd/ 393142 402038 2.26% 473.astar/ 102182 111038 8.67% size_sum 5626721 5815013 3.35% 2. TOT LTO+whole program + O3 vs TOT O2 (0.35% total increase) 471.omnetpp/ 853708 937728 9.84% 450.soplex/ 643273 654057 1.68% 483.xalancbmk/ 3634416 3540646 -2.58% 444.namd/ 393142 401318 2.08% 473.astar/ 102182 112538 10.13% size_sum 5626721 5646287 0.35% 3. TOT LTO+whole program + O2 vs TOT O2 (8.10% total reduction) 471.omnetpp/ 853708 822868 -3.61% 450.soplex/ 643273 611653 -4.92% 483.xalancbmk/ 3634416 3245157 -10.71% 444.namd/ 393142 391698 -0.37% 473.astar/ 102182 99586 -2.54% size_sum 5626721 5170962 -8.10% 4. google 4.4.3 compiler O2 vs TOT O2 (13.95% total reduction) 471.omnetpp/ 853708 545840 -36.06% 450.soplex/ 643273 374674 -41.76% 483.xalancbmk/ 3634416 3556306 -2.15% 444.namd/ 393142 329897 -16.09% 473.astar/ 102182 35301 -65.45% size_sum 5626721 4842018 -13.95% 5. Google 4.4.3 compiler O2 FDO vs TOT O2 (24.81% total reduction) 471.omnetpp/ 853708 514732 -39.71% 450.soplex/ 643273 357426 -44.44% 483.xalancbmk/ 3634416 2985761 -17.85% 444.namd/ 393142 332806 -15.35% 473.astar/ 102182 39797 -61.05% size_sum 5626721 4230522 -24.81% 6. Google 4.4.3 compiler O2 LIPO vs TOT O2 (20.86 % total reduction) 471.omnetpp/ 853708 559944 -34.41% 450.soplex/ 643273 393399 -38.84% 483.xalancbmk/ 3634416 3126428 -13.98% 444.namd/ 393142 334666 -14.87% 473.astar/ 102182 38749 -62.08% size_sum 5626721 4453186 -20.86% SPEC2k text size data: 1. tot O1 vs tot O2 ( 4.48% total reduction) 300.twolf/ 182884 177223 -3.10% 181.mcf/ 11794 11338 -3.87% 164.gzip/ 36705 34388 -6.31% 186.crafty/ 171663 164898 -3.94% 255.vortex/ 463463 456034 -1.60% 256.bzip2/ 28803 28091 -2.47% 176.gcc/ 1422042 1368365 -3.77% 197.parser/ 103225 96644 -6.38% 253.perlbmk/ 563927 515898 -8.52% 175.vpr/ 139321 134316 -3.59% 252.eon/ 607704 591780 -2.62% 254.gap/ 496262 459593 -7.39% size_sum 4227793 4038568 -4.48% 2. tot O3 vs tot O2 : (10.8% total size increase) 300.twolf/ 182884 194620 6.42% 181.mcf/ 11794 13290 12.68% 164.gzip/ 36705 46049 25.46% 186.crafty/ 171663 189892 10.62% 255.vortex/ 463463 495875 6.99% 256.bzip2/ 28803 39939 38.66% 176.gcc/ 1422042 1609786 13.20% 197.parser/ 103225 143558 39.07% 253.perlbmk/ 563927 616855 9.39% 175.vpr/ 139321 147081 5.57% 252.eon/ 607704 625176 2.88% 254.gap/ 496262 563187 13.49% size_sum 4227793 4685308 10.82% 3. tot LTO + -fwhole-program + -O2 vs tot O2 : (3.65% total size reduction) 300.twolf/ 182884 176572 -3.45% 181.mcf/ 11794 9594 -18.65% 164.gzip/ 36705 34439 -6.17% 186.crafty/ 171663 173071 0.82% 255.vortex/ 463463 382157 -17.54% 256.bzip2/ 28803 27142 -5.77% 176.gcc/ 1422042 1364796 -4.03% 197.parser/ 103225 94997 -7.97% 253.perlbmk/ 563927 590087 4.64% 175.vpr/ 139321 123572 -11.30% 252.eon/ 607704 606226 -0.24% 254.gap/ 496262 491006 -1.06% size_sum 4227793 4073659 -3.65% 4. tot LTO + -fwhole-program + -O3 : (16.57% total increase) 300.twolf/ 182884 196316 7.34% 181.mcf/ 11794 11402 -3.32% 164.gzip/ 36705 51477 40.25% 186.crafty/ 171663 214700 25.07% 255.vortex/ 463463 462329 -0.24% 256.bzip2/ 28803 34950 21.34% 176.gcc/ 1422042 1724868 21.30% 197.parser/ 103225 124698 20.80% 253.perlbmk/ 563927 729119 29.29% 175.vpr/ 139321 139729 0.29% 252.eon/ 607704 627194 3.21% 254.gap/ 496262 611515 23.22% size_sum 4227793 4928297 16.57% 5. tot O2 FDO vs tot O2: (1.15% total increase) 300.twolf/ 182884 178247 -2.54% 181.mcf/ 11794 17370 47.28% 164.gzip/ 36705 42889 16.85% 186.crafty/ 171663 184085 7.24% 255.vortex/ 463463 483428 4.31% 256.bzip2/ 28803 33635 16.78% 176.gcc/ 1422042 1441797 1.39% 197.parser/ 103225 140401 36.01% 253.perlbmk/ 563927 546447 -3.10% 175.vpr/ 139321 147153 5.62% 252.eon/ 607704 572388 -5.81% 254.gap/ 496262 488758 -1.51% size_sum 4227793 4276598 1.15% 6. google local compiler O2 FDO vs tot O2 : (6.33% total increase) Pay attention to the large reduction in C++ program's text size -- which is where the size tuning is done. 300.twolf/ 182884 184736 1.01% 181.mcf/ 11794 26560 125.20% 164.gzip/ 36705 48499 32.13% 186.crafty/ 171663 187406 9.17% 255.vortex/ 463463 482090 4.02% 256.bzip2/ 28803 37905 31.60% 176.gcc/ 1422042 1729480 21.62% 197.parser/ 103225 237148 129.74% 253.perlbmk/ 563927 557040 -1.22% 175.vpr/ 139321 153453 10.14% 252.eon/ 607704 312506 -48.58% 254.gap/ 496262 538534 8.52% size_sum 4227793 4495357 6.33% Also for reference, the google compiler vanilla O2 vs tot O2 -- large reduction in C++ size, overall size increase a little. 300.twolf/ 182884 207829 13.64% 181.mcf/ 11794 12008 1.81% 164.gzip/ 36705 41528 13.14% 186.crafty/ 171663 177104 3.17% 255.vortex/ 463463 473298 2.12% 256.bzip2/ 28803 37961 31.80% 176.gcc/ 1422042 1592952 12.02% 197.parser/ 103225 139969 35.60% 253.perlbmk/ 563927 598632 6.15% 175.vpr/ 139321 156869 12.60% 252.eon/ 607704 322478 -46.94% 254.gap/ 496262 550451 10.92% size_sum 4227793 4311079 1.97% 7. LIPO vs tot O2: (23.2% total increase) 300.twolf/ 182884 185960 1.68% 181.mcf/ 11794 26544 125.06% 164.gzip/ 36705 54827 49.37% 186.crafty/ 171663 234494 36.60% 255.vortex/ 463463 596394 28.68% 256.bzip2/ 28803 40492 40.58% 176.gcc/ 1422042 2070851 45.63% 197.parser/ 103225 250537 142.71% 253.perlbmk/ 563927 638320 13.19% 175.vpr/ 139321 156117 12.06% 252.eon/ 607704 370949 -38.96% 254.gap/ 496262 588139 18.51% size_sum 4227793 5213624 23.32% 8. LTO + whole-program +O2 + FDO vs O2: 300.twolf/ 182884 174919 -4.36% 181.mcf/ 11794 16346 38.60% 164.gzip/ 36705 40743 11.00% 186.crafty/ 171663 197698 15.17% 255.vortex/ 463463 395626 -14.64% 256.bzip2/ 28803 36238 25.81% 176.gcc/ 1422042 1439295 1.21% 197.parser/ 103225 143237 38.76% 253.perlbmk/ 563927 590687 4.75% 175.vpr/ 139321 135276 -2.90% 252.eon/ 607704 585954 -3.58% 254.gap/ 496262 487289 -1.81% size_sum 4227793 4243308 0.37% On Tue, Nov 16, 2010 at 12:26 AM, Xinliang David Li <davi...@google.com> wrote: > More FDO related performance numbers > > Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance > by 5% geomean > Experiment 2: our internal gcc compiler (4.4.3 based with many local > patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6% > geomean > Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs > O2 (trunk gcc): LIPO improves by 12% > Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2: LTO + > FDO improves by 10.8% > > > 1. Trunk gcc FDO vs O2 (5%) > > 164.gzip 1324 1302 -1.64% > 175.vpr 1694 1725 1.84% > 176.gcc 2293 2387 4.07% > 181.mcf 1772 1756 -0.88% > 186.crafty 2320 2280 -1.75% > 197.parser 1166 1556 33.42% > 252.eon 2443 2552 4.45% > 253.perlbmk 2410 2586 7.28% > 254.gap 1987 2021 1.71% > 255.vortex 2392 2720 13.71% > 256.bzip2 1719 1717 -0.12% > 300.twolf 2288 2331 1.86% > > 2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%) > > 164.gzip 1324 1317 -0.48% > 175.vpr 1694 1758 3.76% > 176.gcc 2293 2472 7.79% > 181.mcf 1772 1730 -2.35% > 186.crafty 2320 2353 1.40% > 197.parser 1166 1652 41.70% > 252.eon 2443 2610 6.82% > 253.perlbmk 2410 2561 6.23% > 254.gap 1987 1987 -0.04% > 255.vortex 2392 2801 17.09% > 256.bzip2 1719 1748 1.68% > 300.twolf 2288 2335 2.04% > > 3. LIPO vs trunk O2 (12%) > > 164.gzip 1324 1350 1.99% > 175.vpr 1694 1758 3.77% > 176.gcc 2293 2519 9.83% > 181.mcf 1772 1766 -0.33% > 186.crafty 2320 2394 3.16% > 197.parser 1166 1683 44.32% > 252.eon 2443 2879 17.80% > 253.perlbmk 2410 2556 6.04% > 254.gap 1987 2139 7.61% > 255.vortex 2392 3669 53.40% > 256.bzip2 1719 1824 6.09% > 300.twolf 2288 2345 2.49% > > 4. LTO + -fwhole-program + O2 + FDO vs O2 (10.8%) > > 164.gzip 1324 1340 1.25% > 175.vpr 1694 1709 0.87% > 176.gcc 2293 2411 5.13% > 181.mcf 1772 1757 -0.80% > 186.crafty 2320 2566 10.59% > 197.parser 1166 1614 38.44% > 252.eon 2443 2785 13.98% > 253.perlbmk 2410 2618 8.61% > 254.gap 1987 2063 3.81% > 255.vortex 2392 3294 37.69% > 256.bzip2 1719 1956 13.77% > 300.twolf 2288 2404 5.07% > > > David > > > On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li <davi...@google.com> wrote: >> More performance data: >> >> -O2 -funroll-all-loops vs O2: +1.1% geomean >> >> O2 O2 unroll-all-loops >> 164.gzip 1324 1336 0.94% >> 175.vpr 1694 1670 -1.44% >> 176.gcc 2293 2353 2.60% >> 181.mcf 1772 1793 1.20% >> 186.crafty 2320 2300 -0.86% >> 197.parser 1166 1171 0.39% >> 252.eon 2443 2515 2.93% >> 253.perlbmk 2410 2250 -6.66% >> 254.gap 1987 2041 2.68% >> 255.vortex 2392 2411 0.78% >> 256.bzip2 1719 1806 5.08% >> 300.twolf 2288 2436 6.44% >> >> >> -O3 -flto -fwhole-program vs -O2 : geomean +6% (-fwhole-program add ~1% >> ) >> >> 164.gzip 1324 1318 -0.45% >> 175.vpr 1694 1717 1.34% >> 176.gcc 2293 2359 2.88% >> 181.mcf 1772 1772 0.02% >> 186.crafty 2320 2526 8.86% >> 197.parser 1166 1248 7.04% >> 252.eon 2443 2898 18.59% >> 253.perlbmk 2410 2323 -3.62% >> 254.gap 1987 2039 2.58% >> 255.vortex 2392 2918 21.99% >> 256.bzip2 1719 1946 13.19% >> 300.twolf 2288 2342 2.34% >> >> >> -O2 -flto -fwhole-program vs -O2: geomean +3.4% . mainly from three >> programs: vortex, eon and bzip2. >> >> 164.gzip 1324 1313 -0.82% >> 175.vpr 1694 1659 -2.05% >> 176.gcc 2293 2300 0.30% >> 181.mcf 1772 1781 0.52% >> 186.crafty 2320 2327 0.30% >> 197.parser 1166 1188 1.92% >> 252.eon 2443 2664 9.00% >> 253.perlbmk 2410 2470 2.47% >> 254.gap 1987 1987 -0.02% >> 255.vortex 2392 2883 20.53% >> 256.bzip2 1719 1839 7.00% >> 300.twolf 2288 2365 3.34% >> >> >> Thanks, >> >> David >> >> >> On Mon, Nov 15, 2010 at 5:50 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>>> >> > Fortunately linker plugin solves the problem here and this is why I >>>> >> > want to >>>> >> > have it by default. GCC then can do effectively -fwhole-program for >>>> >> > binaries >>>> >> > (since linker knows what will be bound elsewhere) and take advantage >>>> >> > of >>>> >> > visibility((hidden)) hints for shared libraries same way. Most of >>>> >> > important >>>> >> > shared libraries gets visibility ((hidden)) right. >>>> >> > >>>> >> > It is sad that LTO w/o linker plugin doesn't give that much benefits. >>>> >> > Ideas are welcome here. >>>> >> >>>> >> Linker feedback will be limited here -- mostly global variable >>>> >> aliasing (as I remember only 2/3 spec programs benefit from it), it >>>> >> helps You don't get whole program points-to, whole program mod-ref >>>> >> (with context sensitivity), whole program structure layout. The latter >>>> >> are the real kickers (in terms of SPEC performance), but promoting LTO >>>> >> with those numbers can be misleading as many programs won't get it. >>>> > >>>> > Well, I am speaking of our linker plugin here. What it does is to pass >>>> > GCC >>>> > resolution information so it knows what symbols are bound externally. >>>> > Since >>>> > typically you link LTO alone or with small non-LTO part, most of symbols >>>> > are >>>> > not bound and thus effecitvely you get -fwhole-program (-fwhole-program >>>> > just >>>> > declare everything static except for main ()) >>>> > >>>> > We don't really do whole program points-to or structure layout. >>>> >>>> gcc will eventually, right? >>> >>> Sure hope so ;) >>> We really need to solve scalability with our IPA points-to and make it >>> compatible with WHOPR. >>>> >>>> > Mod-ref is just >>>> > simple ipa-reference code. How you get context sensitivity on mod/ref? >>>> >>>> mod-ref relies on points-to. With context sensitive points-to, you can >>>> also get CS mod-ref -- basically mod-ref info per callsite. >>> >>> Ah sure, I was too focused on our current "mod/ref" :) >>> >>> Honza >>> >> >