I did some measurement (64bit). Experiment 1:
-O2 -funroll-loops vs -O2 It improves performance (geomean) by 0.56%, not too much: O2 O2 unroll-loops 164.gzip 1324 1331 0.56% 175.vpr 1694 1605 -5.24% 176.gcc 2293 2350 2.47% 181.mcf 1772 1788 0.90% 186.crafty 2320 2326 0.26% 197.parser 1166 1162 -0.32% 252.eon 2443 2529 3.50% 253.perlbmk 2410 2460 2.07% 254.gap 1987 2019 1.58% 255.vortex 2392 2406 0.58% 256.bzip2 1719 1715 -0.25% 300.twolf 2288 2308 0.88% Experiment 2: O3 vs O2: The improvement on SPEC2k is larger than large internal programs tested -- geomean 2.38%. 164.gzip 1324 1329 0.40% 175.vpr 1694 1700 0.31% 176.gcc 2293 2336 1.89% 181.mcf 1772 1739 -1.81% 186.crafty 2320 2323 0.14% 197.parser 1166 1252 7.39% 252.eon 2443 2645 8.23% 253.perlbmk 2410 2452 1.74% 254.gap 1987 2020 1.62% 255.vortex 2392 2473 3.39% 256.bzip2 1719 1766 2.74% 300.twolf 2288 2350 2.70% Experiment 3: O2 lto vs O2: geomean 0.72% O2 O2 LTO 164.gzip 1324 1317 -0.53% 175.vpr 1694 1697 0.18% 176.gcc 2293 2291 -0.08% 181.mcf 1772 1760 -0.65% 186.crafty 2320 2245 -3.26% 197.parser 1166 1163 -0.29% 252.eon 2443 2576 5.44% 253.perlbmk 2410 2433 0.93% 254.gap 1987 1995 0.36% 255.vortex 2392 2588 8.19% 256.bzip2 1719 1729 0.56% 300.twolf 2288 2248 -1.77% David On Mon, Nov 15, 2010 at 9:54 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> For peak, FDO is the most effective option. It can boost performance >> by 7-10% depending on the program. The options you suggested probably >> won't make too big a dent. -funroll-loops can hurt performance >> without profiling. More aggressive inlining, ipa-cp, unswitching etc > > -funroll-loops overall was 2.2% win on SPECint, -funrol-all-loops 2.5% last > time I noted down the SPECint results of this (that was in 2003, heh :) > http://www.ucw.cz/~hubicka/papers/amd64/node4.html > >> enabled by O3 may help a little if there is any. -ffast-math won't >> help for integer benchmarks other than eon. Traditionally, O3 helps >> FP performance because of the loop transformation enabled, but this >> won't be the case for gcc for now. > > Function inlining definitly helps. -O3 also imply vectorization and other > stuff. > > Honza >> >> Thanks, >> >> David >> >> On Mon, Nov 15, 2010 at 4:29 AM, Andrey Belevantsev <a...@ispras.ru> wrote: >> > Hello, >> > >> > On 14.11.2010 0:08, Xinliang David Li wrote: >> >> >> >> I re-measured the performance difference using trunk gcc and trunk >> >> clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc >> >> because clang/llvm's type based aliasing is not incomplete and not >> >> enabled by default. I also added -fomit-frame-pointer to clang/llvm as >> >> this is gcc's default. The base option is -O2. >> > >> > It would be very interesting to compare also peak numbers, i.e. with LTO >> > and >> > strict aliasing enabled, as well as -O3 and -ffast-math/-funroll-loops, >> > similar to Vlad's or OpenSUSE's options. Can you try to measure these? >> > Maybe you can also run SPEC2k6, if there is enough machine resources, but >> > that's probably asking too much... >> > >> > Andrey >> > >> > >