On Thu, Feb 4, 2021 at 5:28 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > > GCC11 will be the system GCC 2 years from now, and for the > > > > processors then, they shouldn't even need to split a 256-bit vector > > > > into 2 128-bits vectors. > > > > .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show > > > > option B is better than Option A. > > > > Option A: > > > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast > > > > > > > > Option B: > > > > Option A + > > > > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal" > > > > > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > > > > > Given the explicit list for unaligned loads it's a no-brainer to change > > > that > > > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL. Given both > > > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL > > > we should try to benchmark the effect on ZNVER1 - Martin, do we still > > > have a znver1 machine around? > > > > They are also turned on for Sandybridge. I don't believe we should keep it > > in GCC 11 to penalize today's CPUs as well as CPUs in 2024. > > > I agree with H.J, and I would also like to hear Uros' opinion. I don't have any benchmark data to form my opinion on, but I definitely agree that the compiler should tune for the newer processor where speed matters the most, and 10 years old processors are irrelevant as far as speed is concerned. So, if it is expected that gcc-11 will be most used in 2-3 years from now, it should by default target the architecture that will be most used at that time. But I think that distribution maintainers should decide here. Uros.