Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]

H.J. Lu via Gcc-patches Thu, 28 Jan 2021 05:18:31 -0800

On Thu, Jan 28, 2021 at 1:21 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi:
> >    GCC11 will be the system GCC 2 years from now, and for the
> > processors then, they shouldn't even need to split a 256-bit vector
> > into 2 128-bits vectors.
> >    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > option B is better than Option A.
> > Option A:
> > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> >
> > Option B:
> > Option A + 
> > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> >
> >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>
> Given the explicit list for unaligned loads it's a no-brainer to change that
> for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> we should try to benchmark the effect on ZNVER1 - Martin, do we still
> have a znver1 machine around?


They are also turned on for Sandybridge.  I don't believe we should keep it
in GCC 11 to penalize today's CPUs as well as CPUs in 2024.

> Note that with the settings differing in a way to split stores but not to 
> split
> loads, loading a just stored value can cause bad STLF and quite a
> performance hit (since znver1 has 128bit data paths that shouldn't
> be an issue there but it would have an issue for actually aligned data
> on CPUs with 256bit data paths).
>
> Thanks,
> Richard.
>
> >   Ok for trunk?
> >
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
H.J.

Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]

Reply via email to