On Thu, Jan 28, 2021 at 1:21 AM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi: > > GCC11 will be the system GCC 2 years from now, and for the > > processors then, they shouldn't even need to split a 256-bit vector > > into 2 128-bits vectors. > > .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show > > option B is better than Option A. > > Option A: > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast > > > > Option B: > > Option A + > > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal" > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > Given the explicit list for unaligned loads it's a no-brainer to change that > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL. Given both > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL > we should try to benchmark the effect on ZNVER1 - Martin, do we still > have a znver1 machine around?
They are also turned on for Sandybridge. I don't believe we should keep it in GCC 11 to penalize today's CPUs as well as CPUs in 2024. > Note that with the settings differing in a way to split stores but not to > split > loads, loading a just stored value can cause bad STLF and quite a > performance hit (since znver1 has 128bit data paths that shouldn't > be an issue there but it would have an issue for actually aligned data > on CPUs with 256bit data paths). > > Thanks, > Richard. > > > Ok for trunk? > > > > > > > > > > -- > > BR, > > Hongtao -- H.J.