Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More
data come later.
164.gzip 1324 1322 -0.10%
175.vpr 1694 1703 0.51%
176.gcc 2293 2347 2.34%
181.mcf 1772 1797 1.43%
186.crafty 2320 2486 7.12%
197.parser 1166 1236 6.02%
252.eon 2443 2810 14.98%
253.perlbmk 2410 2407 -0.16%
254.gap 1987 2024 1.82%
255.vortex 2392 2826 18.13%
256.bzip2 1719 1760 2.38%
300.twolf 2288 2394 4.63%
David
On Mon, Nov 15, 2010 at 2:38 PM, Jan Hubicka <[email protected]> wrote:
>> I did some measurement (64bit).
>>
>> Experiment 1:
>>
>> -O2 -funroll-loops vs -O2
>>
>> It improves performance (geomean) by 0.56%, not too much:
>> O2 O2 unroll-loops
>> 164.gzip 1324 1331 0.56%
>> 175.vpr 1694 1605 -5.24%
>> 176.gcc 2293 2350 2.47%
>> 181.mcf 1772 1788 0.90%
>> 186.crafty 2320 2326 0.26%
>> 197.parser 1166 1162 -0.32%
>> 252.eon 2443 2529 3.50%
>> 253.perlbmk 2410 2460 2.07%
>> 254.gap 1987 2019 1.58%
>> 255.vortex 2392 2406 0.58%
>> 256.bzip2 1719 1715 -0.25%
>> 300.twolf 2288 2308 0.88%
>
> Can you also try -funroll-all-loops? As for pretty small programs, like
> spec2k, -funroll-all-loops is often win. In just few loops we can work out
> number of iterations.
>
>>
>> Experiment 3: O2 lto vs O2: geomean 0.72%
>> O2 O2 LTO
>> 164.gzip 1324 1317 -0.53%
>> 175.vpr 1694 1697 0.18%
>> 176.gcc 2293 2291 -0.08%
>> 181.mcf 1772 1760 -0.65%
>> 186.crafty 2320 2245 -3.26%
>> 197.parser 1166 1163 -0.29%
>> 252.eon 2443 2576 5.44%
>> 253.perlbmk 2410 2433 0.93%
>> 254.gap 1987 1995 0.36%
>> 255.vortex 2392 2588 8.19%
>> 256.bzip2 1719 1729 0.56%
>> 300.twolf 2288 2248 -1.77%
>
> You need -O3 -fwhole-program -flto for resonable cross module inlining to
> happen.
> -fwhole-program is quite essential to get resonable win from LTO (w/o profile
> feedback).
>
> At least our nightly tester then gets quite nice improvements on few
> benchmark at spec2k,
> see also my gccsummit slides.
>
> Honza
>