Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Xinliang David Li Mon, 15 Nov 2010 14:26:04 -0800

I did some measurement (64bit).

Experiment 1:


-O2 -funroll-loops vs -O2

It improves performance (geomean) by 0.56%, not too much:
                                         O2                 O2 unroll-loops
            164.gzip                1324                1331      0.56%
             175.vpr                1694                1605     -5.24%
             176.gcc                2293                2350      2.47%
             181.mcf                1772                1788      0.90%
          186.crafty                2320                2326      0.26%
          197.parser                1166                1162     -0.32%
             252.eon                2443                2529      3.50%
         253.perlbmk                2410                2460      2.07%
             254.gap                1987                2019      1.58%
          255.vortex                2392                2406      0.58%
           256.bzip2                1719                1715     -0.25%
           300.twolf                2288                2308      0.88%


 Experiment 2: O3 vs O2:

The improvement on SPEC2k is larger than large internal programs
tested  -- geomean 2.38%.


            164.gzip                1324                1329      0.40%
             175.vpr                1694                1700      0.31%
             176.gcc                2293                2336      1.89%
             181.mcf                1772                1739     -1.81%
          186.crafty                2320                2323      0.14%
          197.parser                1166                1252      7.39%
             252.eon                2443                2645      8.23%
         253.perlbmk                2410                2452      1.74%
             254.gap                1987                2020      1.62%
          255.vortex                2392                2473      3.39%
           256.bzip2                1719                1766      2.74%
           300.twolf                2288                2350      2.70%

Experiment 3:    O2 lto vs O2:    geomean 0.72%
                                        O2                   O2 LTO
           164.gzip                1324                1317     -0.53%
             175.vpr                1694                1697      0.18%
             176.gcc                2293                2291     -0.08%
             181.mcf                1772                1760     -0.65%
          186.crafty                2320                2245     -3.26%
          197.parser                1166                1163     -0.29%
             252.eon                2443                2576      5.44%
         253.perlbmk                2410                2433      0.93%
             254.gap                1987                1995      0.36%
          255.vortex                2392                2588      8.19%
           256.bzip2                1719                1729      0.56%
           300.twolf                2288                2248     -1.77%


David


On Mon, Nov 15, 2010 at 9:54 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> For peak, FDO is the most effective option. It can boost performance
>> by 7-10% depending on the program. The options you suggested probably
>> won't make too big a dent.  -funroll-loops can hurt performance
>> without profiling.  More aggressive inlining, ipa-cp, unswitching etc
>
> -funroll-loops overall was 2.2% win on SPECint, -funrol-all-loops 2.5% last
> time I noted down the SPECint results of this (that was in 2003, heh :)
> http://www.ucw.cz/~hubicka/papers/amd64/node4.html
>
>> enabled by O3 may help a little if there is any. -ffast-math won't
>> help for integer benchmarks other than eon.  Traditionally, O3 helps
>> FP performance because of the loop transformation enabled, but this
>> won't be the case for gcc for now.
>
> Function inlining definitly helps. -O3 also imply vectorization and other 
> stuff.
>
> Honza
>>
>> Thanks,
>>
>> David
>>
>> On Mon, Nov 15, 2010 at 4:29 AM, Andrey Belevantsev <a...@ispras.ru> wrote:
>> > Hello,
>> >
>> > On 14.11.2010 0:08, Xinliang David Li wrote:
>> >>
>> >> I re-measured the performance difference using trunk gcc and trunk
>> >> clang/llvm on a core-2 box.  -fno-strict-aliasing is added to gcc
>> >> because clang/llvm's type based aliasing is not incomplete and not
>> >> enabled by default. I also added -fomit-frame-pointer to clang/llvm as
>> >> this is gcc's default. The base option is -O2.
>> >
>> > It would be very interesting to compare also peak numbers, i.e. with LTO 
>> > and
>> > strict aliasing enabled, as well as -O3 and -ffast-math/-funroll-loops,
>> > similar to Vlad's or OpenSUSE's options.  Can you try to measure these?
>> > Maybe you can also run SPEC2k6, if there is enough machine resources, but
>> > that's probably asking too much...
>> >
>> > Andrey
>> >
>> >
>

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Reply via email to