Any comments? On Wed, Nov 22, 2023 at 12:17 PM liuhongt <hongtao....@intel.com> wrote: > > From: "Zhang, Annita" <annita.zh...@intel.com> > > Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and > m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the > performance of -march=x86-64-v3/v4 with -mtune=generic set by > default. One SPEC2017 benchmark 510.parest_r can improve greatly due > to it. From the experiments, the single thread with -O2 > -march=x86-64-v3 can improve 26% on SPR, and 15% on Zen3. Meanwhile, > it didn't cause notable regression in previous platforms including > Cascade Lake and Ice Lake Server. > > On zenver4, it looks like fadd(3 cycles) is still fater than fma(4 > cycles). So in theory, avoid_fma_chain should be also better for > znver4. And according to [1], enable fma_chain is not a generic win on > znver4? > > ----cut from [1]--------------- > I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in > zen4 this flag may not be a win except for very specific benchmarks. I > am still doing some more detailed testing here. > -----cut end-------------- > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607962.html > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog > > * config/i386/x86-tune.def (AVOID_256FMA_CHAINS): Add > m_GENERIC. > --- > gcc/config/i386/x86-tune.def | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def > index 43fa9e8fd6d..a2e57e01550 100644 > --- a/gcc/config/i386/x86-tune.def > +++ b/gcc/config/i386/x86-tune.def > @@ -521,7 +521,7 @@ DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, > "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 > /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or > smaller FMA chain. */ > DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | > m_ZNVER3 > - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM) > + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC) > > /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or > smaller FMA chain. */ > -- > 2.31.1 >
-- BR, Hongtao