From: "Zhang, Annita" <annita.zh...@intel.com> Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the performance of -march=x86-64-v3/v4 with -mtune=generic set by default. One SPEC2017 benchmark 510.parest_r can improve greatly due to it. From the experiments, the single thread with -O2 -march=x86-64-v3 can improve 26% on SPR, and 15% on Zen3. Meanwhile, it didn't cause notable regression in previous platforms including Cascade Lake and Ice Lake Server.
On zenver4, it looks like fadd(3 cycles) is still fater than fma(4 cycles). So in theory, avoid_fma_chain should be also better for znver4. And according to [1], enable fma_chain is not a generic win on znver4? ----cut from [1]--------------- I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this flag may not be a win except for very specific benchmarks. I am still doing some more detailed testing here. -----cut end-------------- [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607962.html Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog * config/i386/x86-tune.def (AVOID_256FMA_CHAINS): Add m_GENERIC. --- gcc/config/i386/x86-tune.def | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index 43fa9e8fd6d..a2e57e01550 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -521,7 +521,7 @@ DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1 | m_ZNVER2 /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or smaller FMA chain. */ DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | m_ZNVER3 - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM) + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC) /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or smaller FMA chain. */ -- 2.31.1