Hi: GCC11 will be the system GCC 2 years from now, and for the processors then, they shouldn't even need to split a 256-bit vector into 2 128-bits vectors. .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show option B is better than Option A. Option A: -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
Option B: Option A + -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal" Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. Ok for trunk? -- BR, Hongtao
0001-Enable-X86_TUNE_AVX256_UNALIGNED_-LOAD-STORE-_OPTIMA.patch
Description: Binary data