On Thu, Jan 28, 2021 at 9:18 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 1:21 AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi:
> > >    GCC11 will be the system GCC 2 years from now, and for the
> > > processors then, they shouldn't even need to split a 256-bit vector
> > > into 2 128-bits vectors.
> > >    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > > option B is better than Option A.
> > > Option A:
> > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> > >
> > > Option B:
> > > Option A + 
> > > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> > >
> > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> >
> > Given the explicit list for unaligned loads it's a no-brainer to change that
> > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> > we should try to benchmark the effect on ZNVER1 - Martin, do we still
> > have a znver1 machine around?
>
> They are also turned on for Sandybridge.  I don't believe we should keep it
> in GCC 11 to penalize today's CPUs as well as CPUs in 2024.
>
I agree with H.J, and I would also like to hear Uros' opinion.
> > Note that with the settings differing in a way to split stores but not to 
> > split
> > loads, loading a just stored value can cause bad STLF and quite a
> > performance hit (since znver1 has 128bit data paths that shouldn't
> > be an issue there but it would have an issue for actually aligned data
> > on CPUs with 256bit data paths).
> >
> > Thanks,
> > Richard.
> >
> > >   Ok for trunk?
> > >
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> H.J.



-- 
BR,
Hongtao

Reply via email to