Hi, On Thu, Jan 28 2021, Richard Biener wrote: > On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> Hi: >> GCC11 will be the system GCC 2 years from now, and for the >> processors then, they shouldn't even need to split a 256-bit vector >> into 2 128-bits vectors. >> .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show >> option B is better than Option A. >> Option A: >> -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast >> >> Option B: >> Option A + >> -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal" >> >> Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > Given the explicit list for unaligned loads it's a no-brainer to change that > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL. Given both > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL > we should try to benchmark the effect on ZNVER1 - Martin, do we still > have a znver1 machine around?
Sorry, I kept forgetting about this and when I did not, the machine was busy. I did just one SPEC CPUrate (single threaded) reference run without and with the patch (on top of 6b1633378b7), both times with -Ofast -mavx2 -mtune=generic and with LTO, and the results were actually rather good (smaller is better): SPEC 2017 FPrate (time): | Benchmark | Before | After | % | |-----------------+--------+-------+--------| | 503.bwaves_r | 217 | 209 | -3.69 | | 507.cactuBSSN_r | 236 | 235 | -0.42 | | 508.namd_r | 252 | 242 | -3.97 | | 510.parest_r | 384 | 383 | -0.26 | | 511.povray_r | 486 | 495 | +1.85 | | 519.lbm_r | 172 | 173 | +0.58 | | 521.wrf_r | 292 | 277 | -5.14 | | 526.blender_r | 300 | 303 | +1.00 | | 527.cam4_r | 255 | 248 | -2.75 | | 538.imagick_r | 400 | 400 | +0.00 | | 544.nab_r | 316 | 316 | +0.00 | | 549.fotonik3d_r | 366 | 351 | -4.10 | | 554.roms_r | 283 | 248 | -12.37 | #+TBLFM: $4=100*$3/$2-100;%+.2f SPEC 2017 INTrate (time): | Benchmark | Before | After | % | |-----------------+--------+-------+-------| | 500.perlbench_r | 446 | 443 | -0.67 | | 502.gcc_r | 267 | 267 | +0.00 | | 505.mcf_r | 285 | 285 | +0.00 | | 520.omnetpp_r | 437 | 436 | -0.23 | | 523.xalancbmk_r | 302 | 308 | +1.99 | | 525.x264_r | 217 | 219 | +0.92 | | 531.deepsjeng_r | 316 | 311 | -1.58 | | 541.leela_r | 500 | 499 | -0.20 | | 548.exchange2_r | 314 | 315 | +0.32 | | 557.xz_r | 391 | 392 | +0.26 | #+TBLFM: $4=100*$3/$2-100;%+.2f If we regard any regressions smaller than 2% as noise then there were none. And 554.roms_r really liked the change, even on znver1. Martin