Some text size measurement.

Summary:
1) LTO with -O3 bloats up code considerably;
2) LTO with -O2 reduces text size compared with -O2
3) Google 4.4.3 based compiler is really effective in reducing C++
program size -- this is where the focus of the tuning was done.
Witnessed by eon in SPEC2k and all C++ apps in SPEC06


Notes:
  1.  -ffunction-sections -Wl,-gc-sections are used in the build.
  2. SPEC06 dealII does not build with trunk GCC with some parsing
error.  Hj Lu, what alt source should be used? (it builds fine with
4.4.3 compiler)
  3. xalancbmk and omnetpp do not build with TOT gcc compiler using
FDO -- compiler ICEes.  Will investigate when there is time.


David

SPEC06 C++ program Data (the first data column is the TOT O2 base number)

1. TOT O3 vs TOT O2 ( 3.35% total increase)

        471.omnetpp/    853708    867988      1.67%
         450.soplex/    643273    656349         2.03%
      483.xalancbmk/   3634416   3777600      3.94%
           444.namd/    393142    402038        2.26%
          473.astar/    102182    111038          8.67%
            size_sum   5626721   5815013      3.35%

2. TOT LTO+whole program + O3 vs TOT O2 (0.35% total increase)

        471.omnetpp/    853708    937728      9.84%
         450.soplex/    643273    654057      1.68%
      483.xalancbmk/   3634416   3540646     -2.58%
           444.namd/    393142    401318      2.08%
          473.astar/    102182    112538     10.13%
            size_sum   5626721   5646287      0.35%

3. TOT LTO+whole program + O2 vs TOT O2 (8.10% total reduction)

        471.omnetpp/    853708    822868     -3.61%
         450.soplex/    643273    611653     -4.92%
      483.xalancbmk/   3634416   3245157    -10.71%
           444.namd/    393142    391698     -0.37%
          473.astar/    102182     99586     -2.54%
            size_sum   5626721   5170962     -8.10%

4. google 4.4.3 compiler  O2 vs TOT O2 (13.95% total reduction)

        471.omnetpp/    853708    545840    -36.06%
         450.soplex/    643273    374674    -41.76%
      483.xalancbmk/   3634416   3556306     -2.15%
           444.namd/    393142    329897    -16.09%
          473.astar/    102182     35301    -65.45%
            size_sum   5626721   4842018    -13.95%

5. Google 4.4.3 compiler O2 FDO vs TOT O2 (24.81% total reduction)

        471.omnetpp/    853708    514732    -39.71%
         450.soplex/    643273    357426    -44.44%
      483.xalancbmk/   3634416   2985761    -17.85%
           444.namd/    393142    332806    -15.35%
          473.astar/    102182     39797    -61.05%
            size_sum   5626721   4230522    -24.81%

6. Google 4.4.3 compiler O2 LIPO vs TOT O2 (20.86 % total reduction)

       471.omnetpp/    853708    559944    -34.41%
         450.soplex/    643273    393399    -38.84%
      483.xalancbmk/   3634416   3126428    -13.98%
           444.namd/    393142    334666    -14.87%
          473.astar/    102182     38749    -62.08%
            size_sum   5626721   4453186    -20.86%



SPEC2k text size data:

1. tot O1 vs tot O2 ( 4.48% total reduction)

          300.twolf/    182884    177223     -3.10%
            181.mcf/     11794     11338     -3.87%
           164.gzip/     36705     34388     -6.31%
         186.crafty/    171663    164898     -3.94%
         255.vortex/    463463    456034     -1.60%
          256.bzip2/     28803     28091     -2.47%
            176.gcc/   1422042   1368365     -3.77%
         197.parser/    103225     96644     -6.38%
        253.perlbmk/    563927    515898     -8.52%
            175.vpr/    139321    134316     -3.59%
            252.eon/    607704    591780     -2.62%
            254.gap/    496262    459593     -7.39%
            size_sum   4227793   4038568     -4.48%


2. tot O3 vs tot O2 : (10.8% total size increase)

          300.twolf/    182884    194620      6.42%
            181.mcf/     11794     13290     12.68%
           164.gzip/     36705     46049     25.46%
         186.crafty/    171663    189892     10.62%
         255.vortex/    463463    495875      6.99%
          256.bzip2/     28803     39939     38.66%
            176.gcc/   1422042   1609786     13.20%
         197.parser/    103225    143558     39.07%
        253.perlbmk/    563927    616855      9.39%
            175.vpr/    139321    147081      5.57%
            252.eon/    607704    625176      2.88%
            254.gap/    496262    563187     13.49%
            size_sum   4227793   4685308     10.82%


3. tot LTO + -fwhole-program + -O2  vs tot O2 : (3.65% total size reduction)

          300.twolf/    182884    176572     -3.45%
            181.mcf/     11794      9594    -18.65%
           164.gzip/     36705     34439     -6.17%
         186.crafty/    171663    173071      0.82%
         255.vortex/    463463    382157    -17.54%
          256.bzip2/     28803     27142     -5.77%
            176.gcc/   1422042   1364796     -4.03%
         197.parser/    103225     94997     -7.97%
        253.perlbmk/    563927    590087      4.64%
            175.vpr/    139321    123572    -11.30%
            252.eon/    607704    606226     -0.24%
            254.gap/    496262    491006     -1.06%
            size_sum   4227793   4073659     -3.65%


4. tot LTO + -fwhole-program + -O3 : (16.57% total increase)

          300.twolf/    182884    196316      7.34%
            181.mcf/     11794     11402     -3.32%
           164.gzip/     36705     51477     40.25%
         186.crafty/    171663    214700     25.07%
         255.vortex/    463463    462329     -0.24%
          256.bzip2/     28803     34950     21.34%
            176.gcc/   1422042   1724868     21.30%
         197.parser/    103225    124698     20.80%
        253.perlbmk/    563927    729119     29.29%
            175.vpr/    139321    139729      0.29%
            252.eon/    607704    627194      3.21%
            254.gap/    496262    611515     23.22%
            size_sum   4227793   4928297     16.57%

5. tot O2 FDO vs tot O2: (1.15% total increase)

              300.twolf/    182884    178247     -2.54%
            181.mcf/     11794     17370     47.28%
           164.gzip/     36705     42889     16.85%
         186.crafty/    171663    184085      7.24%
         255.vortex/    463463    483428      4.31%
          256.bzip2/     28803     33635     16.78%
            176.gcc/   1422042   1441797      1.39%
         197.parser/    103225    140401     36.01%
        253.perlbmk/    563927    546447     -3.10%
            175.vpr/    139321    147153      5.62%
            252.eon/    607704    572388     -5.81%
            254.gap/    496262    488758     -1.51%
            size_sum   4227793   4276598      1.15%


6. google local compiler O2 FDO vs tot O2 : (6.33% total increase)

Pay attention to the large reduction in C++ program's text size --
which is  where the size tuning is done.

         300.twolf/    182884    184736      1.01%
            181.mcf/     11794     26560    125.20%
           164.gzip/     36705     48499     32.13%
         186.crafty/    171663    187406      9.17%
         255.vortex/    463463    482090      4.02%
          256.bzip2/     28803     37905     31.60%
            176.gcc/   1422042   1729480     21.62%
         197.parser/    103225    237148    129.74%
        253.perlbmk/    563927    557040     -1.22%
            175.vpr/    139321    153453     10.14%
            252.eon/    607704    312506    -48.58%
            254.gap/    496262    538534      8.52%
            size_sum   4227793   4495357      6.33%

Also for reference, the google compiler vanilla O2 vs tot O2 -- large
reduction in C++ size, overall size increase a little.

         300.twolf/    182884    207829     13.64%
            181.mcf/     11794     12008      1.81%
           164.gzip/     36705     41528     13.14%
         186.crafty/    171663    177104      3.17%
         255.vortex/    463463    473298      2.12%
          256.bzip2/     28803     37961     31.80%
            176.gcc/   1422042   1592952     12.02%
         197.parser/    103225    139969     35.60%
        253.perlbmk/    563927    598632      6.15%
            175.vpr/    139321    156869     12.60%
            252.eon/    607704    322478    -46.94%
            254.gap/    496262    550451     10.92%
            size_sum   4227793   4311079      1.97%


7. LIPO vs tot O2:  (23.2% total increase)

            300.twolf/    182884    185960      1.68%
            181.mcf/     11794     26544    125.06%
           164.gzip/     36705     54827     49.37%
         186.crafty/    171663    234494     36.60%
         255.vortex/    463463    596394     28.68%
          256.bzip2/     28803     40492     40.58%
            176.gcc/   1422042   2070851     45.63%
         197.parser/    103225    250537    142.71%
        253.perlbmk/    563927    638320     13.19%
            175.vpr/    139321    156117     12.06%
            252.eon/    607704    370949    -38.96%
            254.gap/    496262    588139     18.51%
            size_sum   4227793   5213624     23.32%

8. LTO + whole-program +O2 + FDO vs O2:

         300.twolf/    182884    174919     -4.36%
            181.mcf/     11794     16346     38.60%
           164.gzip/     36705     40743     11.00%
         186.crafty/    171663    197698     15.17%
         255.vortex/    463463    395626    -14.64%
          256.bzip2/     28803     36238     25.81%
            176.gcc/   1422042   1439295      1.21%
         197.parser/    103225    143237     38.76%
        253.perlbmk/    563927    590687      4.75%
            175.vpr/    139321    135276     -2.90%
            252.eon/    607704    585954     -3.58%
            254.gap/    496262    487289     -1.81%
            size_sum   4227793   4243308      0.37%


On Tue, Nov 16, 2010 at 12:26 AM, Xinliang David Li <davi...@google.com> wrote:
> More FDO related performance numbers
>
> Experiment 1:  trunk gcc O2 + FDO vs O2:      FDO improves performance
> by 5% geomean
> Experiment 2: our internal gcc compiler (4.4.3 based with many local
> patches) O2 + FDO vs O2 (trunk gcc):   FDO improves perf by 6.6%
> geomean
> Experiment 3: our internal gcc (4.4.3 with local patchs) O2 + LIPO vs
> O2 (trunk gcc):  LIPO improves by 12%
> Experiment 4: trunk gcc O2 + LTO + fwhole-program + FDO vs O2:  LTO +
> FDO improves by 10.8%
>
>
> 1. Trunk gcc FDO vs O2  (5%)
>
>            164.gzip                1324                1302     -1.64%
>             175.vpr                1694                1725      1.84%
>             176.gcc                2293                2387      4.07%
>             181.mcf                1772                1756     -0.88%
>          186.crafty                2320                2280     -1.75%
>          197.parser                1166                1556     33.42%
>             252.eon                2443                2552      4.45%
>         253.perlbmk                2410                2586      7.28%
>             254.gap                1987                2021      1.71%
>          255.vortex                2392                2720     13.71%
>           256.bzip2                1719                1717     -0.12%
>           300.twolf                2288                2331      1.86%
>
> 2. 4.4.3 gcc with local patch FDO vs trunk O2 (6.6%)
>
>            164.gzip                1324                1317     -0.48%
>             175.vpr                1694                1758      3.76%
>             176.gcc                2293                2472      7.79%
>             181.mcf                1772                1730     -2.35%
>          186.crafty                2320                2353      1.40%
>          197.parser                1166                1652     41.70%
>             252.eon                2443                2610      6.82%
>         253.perlbmk                2410                2561      6.23%
>             254.gap                1987                1987     -0.04%
>          255.vortex                2392                2801     17.09%
>           256.bzip2                1719                1748      1.68%
>           300.twolf                2288                2335      2.04%
>
> 3. LIPO  vs trunk O2 (12%)
>
>            164.gzip                1324                1350      1.99%
>             175.vpr                1694                1758      3.77%
>             176.gcc                2293                2519      9.83%
>             181.mcf                1772                1766     -0.33%
>          186.crafty                2320                2394      3.16%
>          197.parser                1166                1683     44.32%
>             252.eon                2443                2879     17.80%
>         253.perlbmk                2410                2556      6.04%
>             254.gap                1987                2139      7.61%
>          255.vortex                2392                3669     53.40%
>           256.bzip2                1719                1824      6.09%
>           300.twolf                2288                2345      2.49%
>
> 4. LTO + -fwhole-program + O2 + FDO vs O2 (10.8%)
>
>            164.gzip                1324                1340      1.25%
>             175.vpr                1694                1709      0.87%
>             176.gcc                2293                2411      5.13%
>             181.mcf                1772                1757     -0.80%
>          186.crafty                2320                2566     10.59%
>          197.parser                1166                1614     38.44%
>             252.eon                2443                2785     13.98%
>         253.perlbmk                2410                2618      8.61%
>             254.gap                1987                2063      3.81%
>          255.vortex                2392                3294     37.69%
>           256.bzip2                1719                1956     13.77%
>           300.twolf                2288                2404      5.07%
>
>
> David
>
>
> On Mon, Nov 15, 2010 at 6:18 PM, Xinliang David Li <davi...@google.com> wrote:
>> More performance data:
>>
>> -O2 -funroll-all-loops vs O2:   +1.1% geomean
>>
>>                                          O2               O2 unroll-all-loops
>>            164.gzip                1324                1336      0.94%
>>             175.vpr                1694                1670     -1.44%
>>             176.gcc                2293                2353      2.60%
>>             181.mcf                1772                1793      1.20%
>>          186.crafty                2320                2300     -0.86%
>>          197.parser                1166                1171      0.39%
>>             252.eon                2443                2515      2.93%
>>         253.perlbmk                2410                2250     -6.66%
>>             254.gap                1987                2041      2.68%
>>          255.vortex                2392                2411      0.78%
>>           256.bzip2                1719                1806      5.08%
>>           300.twolf                2288                2436      6.44%
>>
>>
>> -O3 -flto -fwhole-program vs -O2  : geomean +6%     (-fwhole-program add ~1% 
>> )
>>
>>            164.gzip                1324                1318     -0.45%
>>             175.vpr                1694                1717      1.34%
>>             176.gcc                2293                2359      2.88%
>>             181.mcf                1772                1772      0.02%
>>          186.crafty                2320                2526      8.86%
>>          197.parser                1166                1248      7.04%
>>             252.eon                2443                2898     18.59%
>>         253.perlbmk                2410                2323     -3.62%
>>             254.gap                1987                2039      2.58%
>>          255.vortex                2392                2918     21.99%
>>           256.bzip2                1719                1946     13.19%
>>           300.twolf                2288                2342      2.34%
>>
>>
>> -O2 -flto -fwhole-program vs -O2: geomean +3.4% . mainly from three
>> programs: vortex, eon and bzip2.
>>
>>            164.gzip                1324                1313     -0.82%
>>             175.vpr                1694                1659     -2.05%
>>             176.gcc                2293                2300      0.30%
>>             181.mcf                1772                1781      0.52%
>>          186.crafty                2320                2327      0.30%
>>          197.parser                1166                1188      1.92%
>>             252.eon                2443                2664      9.00%
>>         253.perlbmk                2410                2470      2.47%
>>             254.gap                1987                1987     -0.02%
>>          255.vortex                2392                2883     20.53%
>>           256.bzip2                1719                1839      7.00%
>>           300.twolf                2288                2365      3.34%
>>
>>
>> Thanks,
>>
>> David
>>
>>
>> On Mon, Nov 15, 2010 at 5:50 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>> On Mon, Nov 15, 2010 at 5:39 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>> >> > Fortunately linker plugin solves the problem here and this is why I 
>>>> >> > want to
>>>> >> > have it by default.  GCC then can do effectively -fwhole-program for 
>>>> >> > binaries
>>>> >> > (since linker knows what will be bound elsewhere) and take advantage 
>>>> >> > of
>>>> >> > visibility((hidden)) hints for shared libraries same way.  Most of 
>>>> >> > important
>>>> >> > shared libraries gets visibility ((hidden)) right.
>>>> >> >
>>>> >> > It is sad that LTO w/o linker plugin doesn't give that much benefits.
>>>> >> > Ideas are welcome here.
>>>> >>
>>>> >> Linker feedback will be limited here -- mostly global variable
>>>> >> aliasing (as I remember only 2/3 spec programs benefit from it), it
>>>> >> helps  You don't get whole program points-to, whole program mod-ref
>>>> >> (with context sensitivity), whole program structure layout. The latter
>>>> >> are the real kickers (in terms of SPEC performance), but promoting LTO
>>>> >> with those numbers can be misleading as many programs won't get it.
>>>> >
>>>> > Well, I am speaking of our linker plugin here.  What it does is to pass 
>>>> > GCC
>>>> > resolution information so it knows what symbols are bound externally. 
>>>> > Since
>>>> > typically you link LTO alone or with small non-LTO part, most of symbols 
>>>> > are
>>>> > not bound and thus effecitvely you get -fwhole-program (-fwhole-program 
>>>> > just
>>>> > declare everything static except for main ())
>>>> >
>>>> > We don't really do whole program points-to or structure layout.
>>>>
>>>> gcc will eventually, right?
>>>
>>> Sure hope so ;)
>>> We really need to solve scalability with our IPA points-to and make it
>>> compatible with WHOPR.
>>>>
>>>> > Mod-ref is just
>>>> > simple ipa-reference code. How you get context sensitivity on mod/ref?
>>>>
>>>> mod-ref relies on points-to. With context sensitive points-to, you can
>>>> also get CS mod-ref -- basically mod-ref info per callsite.
>>>
>>> Ah sure, I was too focused on our current "mod/ref" :)
>>>
>>> Honza
>>>
>>
>

Reply via email to