On Thu, Feb 4, 2021 at 7:45 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Thu, Feb 4, 2021 at 5:28 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > GCC11 will be the system GCC 2 years from now, and for the > > > > > processors then, they shouldn't even need to split a 256-bit vector > > > > > into 2 128-bits vectors. > > > > > .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show > > > > > option B is better than Option A. > > > > > Option A: > > > > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast > > > > > > > > > > Option B: > > > > > Option A + > > > > > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal" > > > > > > > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > > > > > > > Given the explicit list for unaligned loads it's a no-brainer to change > > > > that > > > > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL. Given both > > > > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL > > > > we should try to benchmark the effect on ZNVER1 - Martin, do we still > > > > have a znver1 machine around? > > > > > > They are also turned on for Sandybridge. I don't believe we should keep > > > it > > > in GCC 11 to penalize today's CPUs as well as CPUs in 2024. > > > > > I agree with H.J, and I would also like to hear Uros' opinion. > > I don't have any benchmark data to form my opinion on, but I > definitely agree that the compiler should tune for the newer processor > where speed matters the most, and 10 years old processors are > irrelevant as far as speed is concerned. > > So, if it is expected that gcc-11 will be most used in 2-3 years from > now, it should by default target the architecture that will be most > used at that time. But I think that distribution maintainers should > decide here.
I'm all for the change - the case it could regress is odd anyway as it needs AVX2 enabled and on CPUs with a 128bit data path those shouldn't be prefered mutlilibs (thinking of this new x86_64-v2/v3 stuff). Richard. > Uros.