On Thu, Jan 28, 2021 at 9:18 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Thu, Jan 28, 2021 at 1:21 AM Richard Biener via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Hi: > > > GCC11 will be the system GCC 2 years from now, and for the > > > processors then, they shouldn't even need to split a 256-bit vector > > > into 2 128-bits vectors. > > > .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show > > > option B is better than Option A. > > > Option A: > > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast > > > > > > Option B: > > > Option A + > > > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal" > > > > > > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > > > > Given the explicit list for unaligned loads it's a no-brainer to change that > > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL. Given both > > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL > > we should try to benchmark the effect on ZNVER1 - Martin, do we still > > have a znver1 machine around? > > They are also turned on for Sandybridge. I don't believe we should keep it > in GCC 11 to penalize today's CPUs as well as CPUs in 2024. > I agree with H.J, and I would also like to hear Uros' opinion. > > Note that with the settings differing in a way to split stores but not to > > split > > loads, loading a just stored value can cause bad STLF and quite a > > performance hit (since znver1 has 128bit data paths that shouldn't > > be an issue there but it would have an issue for actually aligned data > > on CPUs with 256bit data paths). > > > > Thanks, > > Richard. > > > > > Ok for trunk? > > > > > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > -- > H.J.
-- BR, Hongtao