Hello, while running benchmarks for inliner tuning I also run benchmarks comparing -O2 and -O2 -ftree-vectorize -ftree-slp-vectorize using Martin Liska's LNT setup (https://lnt.opensuse.org/). The results are summarized below but you can also see also colorful table produced by Martin's LNT magic
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=3&min_percentage_change=0.02&revisions=746f%2C55f&fbclid=IwAR1EhvEnavV5Fg5g404cTrguOXG2cW7b3mRZZvtYn1qy93zihyAanZ7AiWQ https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=746f%2C55f Overall we got following SPECrate improvements: SPECfp2k6 kabylake generic +7.15% SPECfp2k6 kabylake native +9.36% SPECfp2k17 kabylake generic +5.36% SPECfp2k17 kabylake native +6.03% SPECint2k17 kabylake generic +4.13% SPECfp2k6 zen generic +9.98% SPECfp2k6 zen native +7.04% SPECfp2k17 zen generic +6.11% SPECfp2k17 zen native +5.46% SPECint2k17 zen generic +3.61% SPECint2k17 zen native +5.18% The performance results seems surprisingly a lot in favor of vectorization. Martin's setup is also checking code size which goes up by as much 26% on leslie 3d, but since many of benchmarks are small, this is not very representative for overall code size/compile time costs of vectorization. I measured compile time/size on larger programs I have available with notable changes on DealII, but otherwise sub 1% increases. I also benchmarked Firefox but there are no significant differences because build system already uses -O3 for places where it matters (graphics library etc.) Compile time code segment size Firefox mainlin in noise 0.8% gcc from spec2k6 0.5% 0.6% gdb 0.8% 0.3% crafty 0% 0% DealII 3.2% 4% Note that I benchmarked -ftree-slp-vectorize separately before and results was hit/miss, so perhaps enabling only -ftree-vectorize would give better compile time tradeoffs. I was worried of partial memory stalls, but I will benchmark it and also benchmark difference between cost models. There are some performance regressions, most notably in SPEC - exchange (all settings), - gamess (all settings), - calculix (Zen native only), - bwaves (zen native) and induct2 on all settings and ffft2 zen only from Polyhedron. Botan seems very noisy, but it is rather special code. Exchange can be fixed by adding heuristics that it is bad idea to vectorize withing loop nest of 10 containing recursive call. I believe gamess and calculix are understood and i can look into the remaining cases. Overall I am surprised how many improvements vectorization at -O2 can do - clearly more parallel CPUs depends it depends on it. In my experience from analyzing regressions of gcc -O2 compared to clang -O2 buids, vectorization is one of most common reasons. Having gcc -O2 producing lower SPEC scores and comparably large binaries to clang -O2 does not feel OK and I think the problem is not limited just to artificial benchmarks. Even though it is late in release cycle I wonder if we can do that for GCC 9? Performance of vectorization is very architecture specific, I would propose enabling vectorization for Zen, core based chips and generic in x86-64. I can also run benchmarks on buldozer. I can then tune down the cheap model to avoid some of more expensive transformations. Honza Kabylake Spec2k6, generic tuning improvements: SPEC2006/FP/481.wrf -31.33% SPEC2006/FP/436.cactusADM -28.17% SPEC2006/FP/437.leslie3d -17.21% SPEC2006/FP/434.zeusmp -12.90% SPEC2006/FP/454.calculix -6.44% SPEC2006/FP/433.milc -6.03% SPEC2006/FP/459.GemsFDTD -4.65% SPEC2006/FP/450.soplex -2.11% SPEC2006/INT/403.gcc -6.54% SPEC2006/INT/456.hmmer -5.45% SPEC2006/INT/464.h264ref -2.23% regresions: SPEC2006/FP/416.gamess 8.51% SPEC2006/FP/447.dealII 2.73% Kabylake spec2k6 -march=native improvements: SPEC2006/FP/436.cactusADM -45.52% SPEC2006/FP/481.wrf -34.13% SPEC2006/FP/434.zeusmp -20.25% SPEC2006/FP/437.leslie3d -19.44% SPEC2006/FP/459.GemsFDTD -6.85% SPEC2006/FP/433.milc -2.15% SPEC2006/INT/456.hmmer -8.97% SPEC2006/INT/403.gcc -7.07% SPEC2006/INT/464.h264ref -3.00% regressions: SPEC2006/FP/416.gamess 7.97% SPEC2006/INT/483.xalancbmk 3.55% SPEC2006/INT/400.perlbench 2.61% Kabylake spec2k17 generic tuning improvements: SPEC2017/INT/525.x264_r -33.24% SPEC2017/FP/521.wrf_r -30.63% SPEC2017/FP/538.imagick_r -9.16% SPEC2017/FP/554.roms_r -6.29% SPEC2017/INT/523.xalancbmk -5.69% SPEC2017/FP/527.cam4_r -5.19% SPEC2017/INT/557.xz_r -4.58% SPEC2017/FP/510.parest_r -4.28% SPEC2017/FP/549.fotonik3d -2.62% regressions: SPEC2017/INT/548.exchange2 12.54% Kabylake spec2k17 -march=native: improvements: SPEC2017/FP/521.wrf_r -37.25% SPEC2017/INT/525.x264_r -30.31% SPEC2017/FP/554.roms_r -10.43% SPEC2017/FP/527.cam4_r -10.05% SPEC2017/FP/549.fotonik3d -7.82% SPEC2017/FP/510.parest_r -4.48% regressions: SPEC2017/INT/548.exchange2 14.51% SPEC2017/INT/557.xz_r 3.17% SPEC2017/FP/519.lbm_r 2.22% Zen spec2k6 genric tuning improvements: SPEC2006/FP/436.cactusADM -39.94% SPEC2006/FP/481.wrf -33.44% SPEC2006/FP/437.leslie3d -16.35% SPEC2006/FP/434.zeusmp -15.83% SPEC2006/FP/433.milc -13.53% SPEC2006/FP/454.calculix -9.18% SPEC2006/INT/456.hmmer -8.22% SPEC2006/FP/459.GemsFDTD -7.53% SPEC2006/FP/447.dealII -6.12% SPEC2006/INT/403.gcc -3.67% SPEC2006/INT/464.h264ref -2.92% SPEC2006/INT/401.bzip2 -2.07% regressions: SPEC2006/FP/416.gamess 8.06% SPEC2006/INT/400.perlbench 6.52% SPEC2006/INT/483.xalancbmk 3.84% Zen SPEC2k6 -march=native improvements SPEC2006/FP/481.wrf -31.55% SPEC2006/FP/436.cactusADM -29.20% SPEC2006/FP/437.leslie3d -16.91% SPEC2006/FP/433.milc -14.39% SPEC2006/FP/434.zeusmp -10.18% SPEC2006/INT/456.hmmer -8.95% SPEC2006/FP/459.GemsFDTD -7.23% SPEC2006/FP/447.dealII -3.31% SPEC2006/INT/464.h264ref -3.29% SPEC2006/FP/470.lbm -2.83% SPEC2006/INT/403.gcc -2.56% regressions: SPEC2006/FP/416.gamess 8.45% SPEC2006/FP/454.calculix 10.07% Zen SPEC2k17 generic tuning improvements: SPEC2017/INT/525.x264_r -34.06% SPEC2017/FP/521.wrf_r -29.71% SPEC2017/FP/538.imagick_r -7.01% SPEC2017/FP/549.fotonik3d -6.00% SPEC2017/FP/527.cam4_r -5.95% SPEC2017/FP/510.parest_r -5.93% SPEC2017/FP/554.roms_r -5.42% SPEC2017/FP/503.bwaves_r -4.46% SPEC2017/FP/511.povray_r -3.76% SPEC2017/INT/523.xalancbmk -3.10% SPEC2017/FP/507.cactuBSSN -2.22% regressions: SPEC2017/INT/548.exchange2 8.41% SPEC2017/INT/505.mcf_r 2.05% Zen SPEC2k17 -march=native improvements: SPEC2017/INT/525.x264_r -37.00% SPEC2017/FP/521.wrf_r -28.70% SPEC2017/FP/538.imagick_r -17.91% SPEC2017/FP/510.parest_r -7.25% SPEC2017/FP/527.cam4_r -5.52% SPEC2017/FP/554.roms_r -5.10% SPEC2017/INT/523.xalancbmk -3.82% SPEC2017/FP/549.fotonik3d -2.52% SPEC2017/FP/507.cactuBSSN -2.16% SPEC2017/INT/502.gcc -2.12% regressions: SPEC2017/INT/548.exchange2 9.80% SPEC2017/FP/503.bwaves_r 7.81% SPEC2017/INT/531.deepsjeng 2.16% Kabylake Polyhedron generic improvements: tfft2 -23.05% test_fpu2 -18.89% gas_dyn2 -13.55% linpk -7.77% rnflow -2.52% nf -2.24% regressions: air 3.76% induct2 216.41% Zen Polyhedron generic improvements: gas_dyn2 -36.10% test_fpu2 -20.97% linpk -6.29% channel2 -5.04% fatigue2 -3.43% nf -3.07% capacita -2.30% regressions: induct2 231.04% tfft2 34.25% protein 4.81% Kabylake C++ benchmarks generic improvements: nbench/NEURAL NET 34.01% botan/CMAC(AES-128) mac 21.62% botan/AES-128/CBC/PKCS7 enc 21.25% botan/AES-128/CBC/PKCS7 dec 18.43% nbench/LU DECOMPOSITION 13.42% botan/AES-128/EAX encrypt 10.93% botan/AES-128/EAX decrypt 10.50% botan/AES-128/OCB encrypt 9.84% botan/AES-128/OCB decrypt 9.29% nbench/ASSIGNMENT 6.15% botan/AES-128/XTS decrypt 3.74% botan/AES-128/XTS encrypt 3.64% botan/CTR-BE(AES-128) encr 2.61% botan/CTR-BE(AES-128) decr 2.56% botan/AES-128/GCM(16) enct 2.52% botan/AES-128/GCM(16) decr 2.01% regressions: botan/Whirlpool hash -11.35% nbench/HUFFMAN -2.31% botan/Keccak-1600(512) hash -3.61% botan/Tiger(24,3) hash -2.94% Zenith C++ benchmarks generic improvements: nbench/NEURAL NET 47.78% botan/AES-128/CBC/PKCS7 encr 21.07% botan/CMAC(AES-128) mac 19.97% botan/CTR-BE(AES-128) encr 15.21% botan/CTR-BE(AES-128) decr 14.24% botan/AES-128/EAX encrypt 13.46% botan/AES-128/EAX decrypt 12.84% nbench/LU DECOMPOSITION 9.12% botan/AES-128/GCM(16) encr 5.66% botan/AES-128/GCM(16) decr 4.40% botan/AES-128/CBC/PKCS7 decr 2.96% botan/ChaCha20Poly1305 decr 2.67% botan/AES-128/XTS encrypt 2.53% botan/Salsa20 encrypt 2.33% botan/Skein-512(512) hash 2.22% botan/ChaCha20Poly1305 encr 2.14% regressions: nbench/HUFFMAN -12.51% botan/Whirlpool hash -8.26% botan/Camellia-192 encrypt -7.12% botan/Camellia-256 decrypt -7.07% botan/Camellia-192 decrypt -6.82% botan/Camellia-128 decrypt -6.73% botan/Camellia-256 encrypt -6.59% botan/AES-128/XTS decrypt -6.31% botan/Camellia-128 encrypt -6.30% botan/XTEA decrypt -4.87% nbench/ASSIGNMENT -4.85% botan/AES-128/OCB encrypt -3.36% botan/Keccak-1600(512) hash -3.08% botan/AES-128 decrypt -2.52% botan/SHA-160 hash -2.31% Binary sizes and other stats are in the aforementioned links.