Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Xinliang David Li Tue, 16 Nov 2010 00:26:53 -0800

More FDO related performance numbers

Experiment 1:  trunk gcc O2 + FDO vs O2:      FDO improves performance
by 5% geomean
Experiment 2: our internal gcc compiler (4.4.3 based with many local
patches) O2 + FDO vs O2 (trunk gcc):   FDO improves perf by 6.6%
geomean
Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs
O2 (trunk gcc):  LIPO improves by 12%
Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2:  LTO +
FDO improves by 10.8%



1. Trunk gcc FDO vs O2  (5%)

            164.gzip                1324                1302     -1.64%
             175.vpr                1694                1725      1.84%
             176.gcc                2293                2387      4.07%
             181.mcf                1772                1756     -0.88%
          186.crafty                2320                2280     -1.75%
          197.parser                1166                1556     33.42%
             252.eon                2443                2552      4.45%
         253.perlbmk                2410                2586      7.28%
             254.gap                1987                2021      1.71%
          255.vortex                2392                2720     13.71%
           256.bzip2                1719                1717     -0.12%
           300.twolf                2288                2331      1.86%

2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%)

            164.gzip                1324                1317     -0.48%
             175.vpr                1694                1758      3.76%
             176.gcc                2293                2472      7.79%
             181.mcf                1772                1730     -2.35%
          186.crafty                2320                2353      1.40%
          197.parser                1166                1652     41.70%
             252.eon                2443                2610      6.82%
         253.perlbmk                2410                2561      6.23%
             254.gap                1987                1987     -0.04%
          255.vortex                2392                2801     17.09%
           256.bzip2                1719                1748      1.68%
           300.twolf                2288                2335      2.04%

3. LIPO  vs trunk O2 (12%)

            164.gzip                1324                1350      1.99%
             175.vpr                1694                1758      3.77%
             176.gcc                2293                2519      9.83%
             181.mcf                1772                1766     -0.33%
          186.crafty                2320                2394      3.16%
          197.parser                1166                1683     44.32%
             252.eon                2443                2879     17.80%
         253.perlbmk                2410                2556      6.04%
             254.gap                1987                2139      7.61%
          255.vortex                2392                3669     53.40%
           256.bzip2                1719                1824      6.09%
           300.twolf                2288                2345      2.49%

4. LTO + -fwhole-program + O2 + FDO vs O2 (10.8%)

            164.gzip                1324                1340      1.25%
             175.vpr                1694                1709      0.87%
             176.gcc                2293                2411      5.13%
             181.mcf                1772                1757     -0.80%
          186.crafty                2320                2566     10.59%
          197.parser                1166                1614     38.44%
             252.eon                2443                2785     13.98%
         253.perlbmk                2410                2618      8.61%
             254.gap                1987                2063      3.81%
          255.vortex                2392                3294     37.69%
           256.bzip2                1719                1956     13.77%
           300.twolf                2288                2404      5.07%


David


On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li <davi...@google.com> wrote:
> More performance data:
>
> -O2 -funroll-all-loops vs O2:   +1.1% geomean
>
>                                          O2               O2 unroll-all-loops
>            164.gzip                1324                1336      0.94%
>             175.vpr                1694                1670     -1.44%
>             176.gcc                2293                2353      2.60%
>             181.mcf                1772                1793      1.20%
>          186.crafty                2320                2300     -0.86%
>          197.parser                1166                1171      0.39%
>             252.eon                2443                2515      2.93%
>         253.perlbmk                2410                2250     -6.66%
>             254.gap                1987                2041      2.68%
>          255.vortex                2392                2411      0.78%
>           256.bzip2                1719                1806      5.08%
>           300.twolf                2288                2436      6.44%
>
>
> -O3 -flto -fwhole-program vs -O2  : geomean +6%     (-fwhole-program add ~1% )
>
>            164.gzip                1324                1318     -0.45%
>             175.vpr                1694                1717      1.34%
>             176.gcc                2293                2359      2.88%
>             181.mcf                1772                1772      0.02%
>          186.crafty                2320                2526      8.86%
>          197.parser                1166                1248      7.04%
>             252.eon                2443                2898     18.59%
>         253.perlbmk                2410                2323     -3.62%
>             254.gap                1987                2039      2.58%
>          255.vortex                2392                2918     21.99%
>           256.bzip2                1719                1946     13.19%
>           300.twolf                2288                2342      2.34%
>
>
> -O2 -flto -fwhole-program vs -O2: geomean +3.4% . mainly from three
> programs: vortex, eon and bzip2.
>
>            164.gzip                1324                1313     -0.82%
>             175.vpr                1694                1659     -2.05%
>             176.gcc                2293                2300      0.30%
>             181.mcf                1772                1781      0.52%
>          186.crafty                2320                2327      0.30%
>          197.parser                1166                1188      1.92%
>             252.eon                2443                2664      9.00%
>         253.perlbmk                2410                2470      2.47%
>             254.gap                1987                1987     -0.02%
>          255.vortex                2392                2883     20.53%
>           256.bzip2                1719                1839      7.00%
>           300.twolf                2288                2365      3.34%
>
>
> Thanks,
>
> David
>
>
> On Mon, Nov 15, 2010 at 5:50 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>> >> > Fortunately linker plugin solves the problem here and this is why I 
>>> >> > want to
>>> >> > have it by default.  GCC then can do effectively -fwhole-program for 
>>> >> > binaries
>>> >> > (since linker knows what will be bound elsewhere) and take advantage of
>>> >> > visibility((hidden)) hints for shared libraries same way.  Most of 
>>> >> > important
>>> >> > shared libraries gets visibility ((hidden)) right.
>>> >> >
>>> >> > It is sad that LTO w/o linker plugin doesn't give that much benefits.
>>> >> > Ideas are welcome here.
>>> >>
>>> >> Linker feedback will be limited here -- mostly global variable
>>> >> aliasing (as I remember only 2/3 spec programs benefit from it), it
>>> >> helps  You don't get whole program points-to, whole program mod-ref
>>> >> (with context sensitivity), whole program structure layout. The latter
>>> >> are the real kickers (in terms of SPEC performance), but promoting LTO
>>> >> with those numbers can be misleading as many programs won't get it.
>>> >
>>> > Well, I am speaking of our linker plugin here.  What it does is to pass 
>>> > GCC
>>> > resolution information so it knows what symbols are bound externally. 
>>> > Since
>>> > typically you link LTO alone or with small non-LTO part, most of symbols 
>>> > are
>>> > not bound and thus effecitvely you get -fwhole-program (-fwhole-program 
>>> > just
>>> > declare everything static except for main ())
>>> >
>>> > We don't really do whole program points-to or structure layout.
>>>
>>> gcc will eventually, right?
>>
>> Sure hope so ;)
>> We really need to solve scalability with our IPA points-to and make it
>> compatible with WHOPR.
>>>
>>> > Mod-ref is just
>>> > simple ipa-reference code. How you get context sensitivity on mod/ref?
>>>
>>> mod-ref relies on points-to. With context sensitive points-to, you can
>>> also get CS mod-ref -- basically mod-ref info per callsite.
>>
>> Ah sure, I was too focused on our current "mod/ref" :)
>>
>> Honza
>>
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Reply via email to