More FDO related performance numbers Experiment 1: trunk gcc O2 + FDO vs O2: FDO improves performance by 5% geomean Experiment 2: our internal gcc compiler (4.4.3 based with many local patches) O2 + FDO vs O2 (trunk gcc): FDO improves perf by 6.6% geomean Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs O2 (trunk gcc): LIPO improves by 12% Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2: LTO + FDO improves by 10.8%
1. Trunk gcc FDO vs O2 (5%) 164.gzip 1324 1302 -1.64% 175.vpr 1694 1725 1.84% 176.gcc 2293 2387 4.07% 181.mcf 1772 1756 -0.88% 186.crafty 2320 2280 -1.75% 197.parser 1166 1556 33.42% 252.eon 2443 2552 4.45% 253.perlbmk 2410 2586 7.28% 254.gap 1987 2021 1.71% 255.vortex 2392 2720 13.71% 256.bzip2 1719 1717 -0.12% 300.twolf 2288 2331 1.86% 2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%) 164.gzip 1324 1317 -0.48% 175.vpr 1694 1758 3.76% 176.gcc 2293 2472 7.79% 181.mcf 1772 1730 -2.35% 186.crafty 2320 2353 1.40% 197.parser 1166 1652 41.70% 252.eon 2443 2610 6.82% 253.perlbmk 2410 2561 6.23% 254.gap 1987 1987 -0.04% 255.vortex 2392 2801 17.09% 256.bzip2 1719 1748 1.68% 300.twolf 2288 2335 2.04% 3. LIPO vs trunk O2 (12%) 164.gzip 1324 1350 1.99% 175.vpr 1694 1758 3.77% 176.gcc 2293 2519 9.83% 181.mcf 1772 1766 -0.33% 186.crafty 2320 2394 3.16% 197.parser 1166 1683 44.32% 252.eon 2443 2879 17.80% 253.perlbmk 2410 2556 6.04% 254.gap 1987 2139 7.61% 255.vortex 2392 3669 53.40% 256.bzip2 1719 1824 6.09% 300.twolf 2288 2345 2.49% 4. LTO + -fwhole-program + O2 + FDO vs O2 (10.8%) 164.gzip 1324 1340 1.25% 175.vpr 1694 1709 0.87% 176.gcc 2293 2411 5.13% 181.mcf 1772 1757 -0.80% 186.crafty 2320 2566 10.59% 197.parser 1166 1614 38.44% 252.eon 2443 2785 13.98% 253.perlbmk 2410 2618 8.61% 254.gap 1987 2063 3.81% 255.vortex 2392 3294 37.69% 256.bzip2 1719 1956 13.77% 300.twolf 2288 2404 5.07% David On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li <davi...@google.com> wrote: > More performance data: > > -O2 -funroll-all-loops vs O2: +1.1% geomean > > O2 O2 unroll-all-loops > 164.gzip 1324 1336 0.94% > 175.vpr 1694 1670 -1.44% > 176.gcc 2293 2353 2.60% > 181.mcf 1772 1793 1.20% > 186.crafty 2320 2300 -0.86% > 197.parser 1166 1171 0.39% > 252.eon 2443 2515 2.93% > 253.perlbmk 2410 2250 -6.66% > 254.gap 1987 2041 2.68% > 255.vortex 2392 2411 0.78% > 256.bzip2 1719 1806 5.08% > 300.twolf 2288 2436 6.44% > > > -O3 -flto -fwhole-program vs -O2 : geomean +6% (-fwhole-program add ~1% ) > > 164.gzip 1324 1318 -0.45% > 175.vpr 1694 1717 1.34% > 176.gcc 2293 2359 2.88% > 181.mcf 1772 1772 0.02% > 186.crafty 2320 2526 8.86% > 197.parser 1166 1248 7.04% > 252.eon 2443 2898 18.59% > 253.perlbmk 2410 2323 -3.62% > 254.gap 1987 2039 2.58% > 255.vortex 2392 2918 21.99% > 256.bzip2 1719 1946 13.19% > 300.twolf 2288 2342 2.34% > > > -O2 -flto -fwhole-program vs -O2: geomean +3.4% . mainly from three > programs: vortex, eon and bzip2. > > 164.gzip 1324 1313 -0.82% > 175.vpr 1694 1659 -2.05% > 176.gcc 2293 2300 0.30% > 181.mcf 1772 1781 0.52% > 186.crafty 2320 2327 0.30% > 197.parser 1166 1188 1.92% > 252.eon 2443 2664 9.00% > 253.perlbmk 2410 2470 2.47% > 254.gap 1987 1987 -0.02% > 255.vortex 2392 2883 20.53% > 256.bzip2 1719 1839 7.00% > 300.twolf 2288 2365 3.34% > > > Thanks, > > David > > > On Mon, Nov 15, 2010 at 5:50 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> >> > Fortunately linker plugin solves the problem here and this is why I >>> >> > want to >>> >> > have it by default. GCC then can do effectively -fwhole-program for >>> >> > binaries >>> >> > (since linker knows what will be bound elsewhere) and take advantage of >>> >> > visibility((hidden)) hints for shared libraries same way. Most of >>> >> > important >>> >> > shared libraries gets visibility ((hidden)) right. >>> >> > >>> >> > It is sad that LTO w/o linker plugin doesn't give that much benefits. >>> >> > Ideas are welcome here. >>> >> >>> >> Linker feedback will be limited here -- mostly global variable >>> >> aliasing (as I remember only 2/3 spec programs benefit from it), it >>> >> helps You don't get whole program points-to, whole program mod-ref >>> >> (with context sensitivity), whole program structure layout. The latter >>> >> are the real kickers (in terms of SPEC performance), but promoting LTO >>> >> with those numbers can be misleading as many programs won't get it. >>> > >>> > Well, I am speaking of our linker plugin here. What it does is to pass >>> > GCC >>> > resolution information so it knows what symbols are bound externally. >>> > Since >>> > typically you link LTO alone or with small non-LTO part, most of symbols >>> > are >>> > not bound and thus effecitvely you get -fwhole-program (-fwhole-program >>> > just >>> > declare everything static except for main ()) >>> > >>> > We don't really do whole program points-to or structure layout. >>> >>> gcc will eventually, right? >> >> Sure hope so ;) >> We really need to solve scalability with our IPA points-to and make it >> compatible with WHOPR. >>> >>> > Mod-ref is just >>> > simple ipa-reference code. How you get context sensitivity on mod/ref? >>> >>> mod-ref relies on points-to. With context sensitive points-to, you can >>> also get CS mod-ref -- basically mod-ref info per callsite. >> >> Ah sure, I was too focused on our current "mod/ref" :) >> >> Honza >> >