On Sun, 6 Jan 2019, Jan Hubicka wrote: > Hello, > while running benchmarks for inliner tuning I also run benchmarks > comparing -O2 and -O2 -ftree-vectorize -ftree-slp-vectorize using Martin > Liska's LNT setup (https://lnt.opensuse.org/). The results are > summarized below but you can also see also colorful table produced > by Martin's LNT magic > > https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=3&min_percentage_change=0.02&revisions=746f%2C55f&fbclid=IwAR1EhvEnavV5Fg5g404cTrguOXG2cW7b3mRZZvtYn1qy93zihyAanZ7AiWQ > https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=746f%2C55f > > Overall we got following SPECrate improvements: > > SPECfp2k6 kabylake generic +7.15% > SPECfp2k6 kabylake native +9.36% > SPECfp2k17 kabylake generic +5.36% > SPECfp2k17 kabylake native +6.03% > SPECint2k17 kabylake generic +4.13% > > SPECfp2k6 zen generic +9.98% > SPECfp2k6 zen native +7.04% > SPECfp2k17 zen generic +6.11% > SPECfp2k17 zen native +5.46% > SPECint2k17 zen generic +3.61% > SPECint2k17 zen native +5.18% > > The performance results seems surprisingly a lot in favor of > vectorization. Martin's setup is also checking code size which goes up > by as much 26% on leslie 3d, but since many of benchmarks are small, > this is not very representative for overall code size/compile time costs > of vectorization. > > I measured compile time/size on larger programs I have available with > notable changes on DealII, but otherwise sub 1% increases. I also > benchmarked Firefox but there are no significant differences because > build system already uses -O3 for places where it matters (graphics > library etc.)
Well, as much as compile-time/size of spec is not representable the performance improvements are. > Compile time code segment size > Firefox mainlin in noise 0.8% > gcc from spec2k6 0.5% 0.6% > gdb 0.8% 0.3% > crafty 0% 0% > DealII 3.2% 4% > > Note that I benchmarked -ftree-slp-vectorize separately before and > results was hit/miss, so perhaps enabling only -ftree-vectorize would > give better compile time tradeoffs. I was worried of partial memory > stalls, but I will benchmark it and also benchmark difference between > cost models. > > There are some performance regressions, most notably in SPEC > - exchange (all settings), > - gamess (all settings), > - calculix (Zen native only), > - bwaves (zen native) > and induct2 on all settings and ffft2 zen only from Polyhedron. Botan > seems very noisy, but it is rather special code. > > Exchange can be fixed by adding heuristics that it is bad idea to > vectorize withing loop nest of 10 containing recursive call. I believe > gamess and calculix are understood and i can look into the remaining > cases. > > Overall I am surprised how many improvements vectorization at -O2 can do > - clearly more parallel CPUs depends it depends on it. In my experience > from analyzing regressions of gcc -O2 compared to clang -O2 buids, > vectorization is one of most common reasons. Having gcc -O2 producing > lower SPEC scores and comparably large binaries to clang -O2 does not > feel OK and I think the problem is not limited just to artificial > benchmarks. > > Even though it is late in release cycle I wonder if we can do that for > GCC 9? Performance of vectorization is very architecture specific, I > would propose enabling vectorization for Zen, core based chips and > generic in x86-64. I can also run benchmarks on buldozer. I can then > tune down the cheap model to avoid some of more expensive > transformations. I'd rather not do this now, it's _way_ too late (also considering you are again doing inliner tuning so late). See our last attempts at this btw. Richard. > Honza > > > Kabylake Spec2k6, generic tuning > > improvements: > SPEC2006/FP/481.wrf -31.33% > SPEC2006/FP/436.cactusADM -28.17% > SPEC2006/FP/437.leslie3d -17.21% > SPEC2006/FP/434.zeusmp -12.90% > SPEC2006/FP/454.calculix -6.44% > SPEC2006/FP/433.milc -6.03% > SPEC2006/FP/459.GemsFDTD -4.65% > SPEC2006/FP/450.soplex -2.11% > SPEC2006/INT/403.gcc -6.54% > SPEC2006/INT/456.hmmer -5.45% > SPEC2006/INT/464.h264ref -2.23% > regresions: > SPEC2006/FP/416.gamess 8.51% > SPEC2006/FP/447.dealII 2.73% > > Kabylake spec2k6 -march=native > > improvements: > SPEC2006/FP/436.cactusADM -45.52% > SPEC2006/FP/481.wrf -34.13% > SPEC2006/FP/434.zeusmp -20.25% > SPEC2006/FP/437.leslie3d -19.44% > SPEC2006/FP/459.GemsFDTD -6.85% > SPEC2006/FP/433.milc -2.15% > SPEC2006/INT/456.hmmer -8.97% > SPEC2006/INT/403.gcc -7.07% > SPEC2006/INT/464.h264ref -3.00% > regressions: > SPEC2006/FP/416.gamess 7.97% > SPEC2006/INT/483.xalancbmk 3.55% > SPEC2006/INT/400.perlbench 2.61% > > Kabylake spec2k17 generic tuning > > improvements: > SPEC2017/INT/525.x264_r -33.24% > SPEC2017/FP/521.wrf_r -30.63% > SPEC2017/FP/538.imagick_r -9.16% > SPEC2017/FP/554.roms_r -6.29% > SPEC2017/INT/523.xalancbmk -5.69% > SPEC2017/FP/527.cam4_r -5.19% > SPEC2017/INT/557.xz_r -4.58% > SPEC2017/FP/510.parest_r -4.28% > SPEC2017/FP/549.fotonik3d -2.62% > regressions: > SPEC2017/INT/548.exchange2 12.54% > > Kabylake spec2k17 -march=native: > > improvements: > SPEC2017/FP/521.wrf_r -37.25% > SPEC2017/INT/525.x264_r -30.31% > SPEC2017/FP/554.roms_r -10.43% > SPEC2017/FP/527.cam4_r -10.05% > SPEC2017/FP/549.fotonik3d -7.82% > SPEC2017/FP/510.parest_r -4.48% > regressions: > SPEC2017/INT/548.exchange2 14.51% > SPEC2017/INT/557.xz_r 3.17% > SPEC2017/FP/519.lbm_r 2.22% > > Zen spec2k6 genric tuning > > improvements: > SPEC2006/FP/436.cactusADM -39.94% > SPEC2006/FP/481.wrf -33.44% > SPEC2006/FP/437.leslie3d -16.35% > SPEC2006/FP/434.zeusmp -15.83% > SPEC2006/FP/433.milc -13.53% > SPEC2006/FP/454.calculix -9.18% > SPEC2006/INT/456.hmmer -8.22% > SPEC2006/FP/459.GemsFDTD -7.53% > SPEC2006/FP/447.dealII -6.12% > SPEC2006/INT/403.gcc -3.67% > SPEC2006/INT/464.h264ref -2.92% > SPEC2006/INT/401.bzip2 -2.07% > regressions: > SPEC2006/FP/416.gamess 8.06% > SPEC2006/INT/400.perlbench 6.52% > SPEC2006/INT/483.xalancbmk 3.84% > > Zen SPEC2k6 -march=native > > improvements > SPEC2006/FP/481.wrf -31.55% > SPEC2006/FP/436.cactusADM -29.20% > SPEC2006/FP/437.leslie3d -16.91% > SPEC2006/FP/433.milc -14.39% > SPEC2006/FP/434.zeusmp -10.18% > SPEC2006/INT/456.hmmer -8.95% > SPEC2006/FP/459.GemsFDTD -7.23% > SPEC2006/FP/447.dealII -3.31% > SPEC2006/INT/464.h264ref -3.29% > SPEC2006/FP/470.lbm -2.83% > SPEC2006/INT/403.gcc -2.56% > regressions: > SPEC2006/FP/416.gamess 8.45% > SPEC2006/FP/454.calculix 10.07% > > Zen SPEC2k17 generic tuning > improvements: > SPEC2017/INT/525.x264_r -34.06% > SPEC2017/FP/521.wrf_r -29.71% > SPEC2017/FP/538.imagick_r -7.01% > SPEC2017/FP/549.fotonik3d -6.00% > SPEC2017/FP/527.cam4_r -5.95% > SPEC2017/FP/510.parest_r -5.93% > SPEC2017/FP/554.roms_r -5.42% > SPEC2017/FP/503.bwaves_r -4.46% > SPEC2017/FP/511.povray_r -3.76% > SPEC2017/INT/523.xalancbmk -3.10% > SPEC2017/FP/507.cactuBSSN -2.22% > regressions: > SPEC2017/INT/548.exchange2 8.41% > SPEC2017/INT/505.mcf_r 2.05% > > Zen SPEC2k17 -march=native > improvements: > SPEC2017/INT/525.x264_r -37.00% > SPEC2017/FP/521.wrf_r -28.70% > SPEC2017/FP/538.imagick_r -17.91% > SPEC2017/FP/510.parest_r -7.25% > SPEC2017/FP/527.cam4_r -5.52% > SPEC2017/FP/554.roms_r -5.10% > SPEC2017/INT/523.xalancbmk -3.82% > SPEC2017/FP/549.fotonik3d -2.52% > SPEC2017/FP/507.cactuBSSN -2.16% > SPEC2017/INT/502.gcc -2.12% > regressions: > SPEC2017/INT/548.exchange2 9.80% > SPEC2017/FP/503.bwaves_r 7.81% > SPEC2017/INT/531.deepsjeng 2.16% > > > Kabylake Polyhedron generic > > improvements: > tfft2 -23.05% > test_fpu2 -18.89% > gas_dyn2 -13.55% > linpk -7.77% > rnflow -2.52% > nf -2.24% > regressions: > air 3.76% > induct2 216.41% > > Zen Polyhedron generic > > improvements: > gas_dyn2 -36.10% > test_fpu2 -20.97% > linpk -6.29% > channel2 -5.04% > fatigue2 -3.43% > nf -3.07% > capacita -2.30% > regressions: > induct2 231.04% > tfft2 34.25% > protein 4.81% > > Kabylake C++ benchmarks generic > > improvements: > nbench/NEURAL NET 34.01% > botan/CMAC(AES-128) mac 21.62% > botan/AES-128/CBC/PKCS7 enc 21.25% > botan/AES-128/CBC/PKCS7 dec 18.43% > nbench/LU DECOMPOSITION 13.42% > botan/AES-128/EAX encrypt 10.93% > botan/AES-128/EAX decrypt 10.50% > botan/AES-128/OCB encrypt 9.84% > botan/AES-128/OCB decrypt 9.29% > nbench/ASSIGNMENT 6.15% > botan/AES-128/XTS decrypt 3.74% > botan/AES-128/XTS encrypt 3.64% > botan/CTR-BE(AES-128) encr 2.61% > botan/CTR-BE(AES-128) decr 2.56% > botan/AES-128/GCM(16) enct 2.52% > botan/AES-128/GCM(16) decr 2.01% > regressions: > botan/Whirlpool hash -11.35% > nbench/HUFFMAN -2.31% > botan/Keccak-1600(512) hash -3.61% > botan/Tiger(24,3) hash -2.94% > > Zenith C++ benchmarks generic > > improvements: > nbench/NEURAL NET 47.78% > botan/AES-128/CBC/PKCS7 encr 21.07% > botan/CMAC(AES-128) mac 19.97% > botan/CTR-BE(AES-128) encr 15.21% > botan/CTR-BE(AES-128) decr 14.24% > botan/AES-128/EAX encrypt 13.46% > botan/AES-128/EAX decrypt 12.84% > nbench/LU DECOMPOSITION 9.12% > botan/AES-128/GCM(16) encr 5.66% > botan/AES-128/GCM(16) decr 4.40% > botan/AES-128/CBC/PKCS7 decr 2.96% > botan/ChaCha20Poly1305 decr 2.67% > botan/AES-128/XTS encrypt 2.53% > botan/Salsa20 encrypt 2.33% > botan/Skein-512(512) hash 2.22% > botan/ChaCha20Poly1305 encr 2.14% > regressions: > nbench/HUFFMAN -12.51% > botan/Whirlpool hash -8.26% > botan/Camellia-192 encrypt -7.12% > botan/Camellia-256 decrypt -7.07% > botan/Camellia-192 decrypt -6.82% > botan/Camellia-128 decrypt -6.73% > botan/Camellia-256 encrypt -6.59% > botan/AES-128/XTS decrypt -6.31% > botan/Camellia-128 encrypt -6.30% > botan/XTEA decrypt -4.87% > nbench/ASSIGNMENT -4.85% > botan/AES-128/OCB encrypt -3.36% > botan/Keccak-1600(512) hash -3.08% > botan/AES-128 decrypt -2.52% > botan/SHA-160 hash -2.31% > > Binary sizes and other stats are in the aforementioned links. > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)