[PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms

liuhongt Tue, 21 Nov 2023 20:18:02 -0800

From: "Zhang, Annita" <annita.zh...@intel.com>

Avoid_fma_chain was enabled in m_SAPPHIRERAPIDS, m_ALDERLAKE and
m_CORE_HYBRID. It can also be enabled in m_GENERIC to improve the
performance of -march=x86-64-v3/v4 with -mtune=generic set by
default. One SPEC2017 benchmark 510.parest_r can improve greatly due
to it. From the experiments, the single thread with -O2
-march=x86-64-v3 can improve 26% on SPR, and 15% on Zen3. Meanwhile,
it didn't cause notable regression in previous platforms including
Cascade Lake and Ice Lake Server.


On zenver4, it looks like fadd(3 cycles) is still fater than fma(4
cycles). So in theory, avoid_fma_chain should be also better for
znver4. And according to [1], enable fma_chain is not a generic win on
znver4?

----cut from [1]---------------
I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in
zen4 this flag may not be a win except for very specific benchmarks. I
am still doing some more detailed testing here.
-----cut end--------------

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607962.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog

        * config/i386/x86-tune.def (AVOID_256FMA_CHAINS): Add
        m_GENERIC.
---
 gcc/config/i386/x86-tune.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 43fa9e8fd6d..a2e57e01550 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -521,7 +521,7 @@ DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", 
m_ZNVER1 | m_ZNVER2
 /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
    smaller FMA chain.  */
 DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2 | 
m_ZNVER3
-         | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM)
+         | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERIC)
 
 /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512bit or
    smaller FMA chain.  */
-- 
2.31.1

[PATCH] Set AVOID_256FMA_CHAINS TO m_GENERIC as it's generally good to new platforms

Reply via email to