Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Richard Earnshaw (lists) Wed, 11 Dec 2024 04:16:43 -0800

On 11/12/2024 09:54, Tamar Christina wrote:

-----Original Message-----
From: Richard Sandiford <richard.sandif...@arm.com>
Sent: Wednesday, December 11, 2024 9:50 AM
To: Tamar Christina <tamar.christ...@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
<richard.earns...@arm.com>; ktkac...@gcc.gnu.org
Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
support it


Tamar Christina <tamar.christ...@arm.com> writes:

-----Original Message-----
From: Richard Sandiford <richard.sandif...@arm.com>
Sent: Wednesday, December 11, 2024 9:32 AM
To: Tamar Christina <tamar.christ...@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
<richard.earns...@arm.com>; ktkac...@gcc.gnu.org
Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
support it

Tamar Christina <tamar.christ...@arm.com> writes:

Hi All,

GCC 15 added two new fusions CMP+CSEL and CMP+CSET.

This patch enables them for cores that support based on their Software
Optimization Guides and generically on Armv9-A.   Even if a core does not
support it there's no negative performance impact.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        * config/aarch64/aarch64-fusion-pairs.def

(AARCH64_FUSE_NEOVERSE_BASE):

        New.
        * config/aarch64/tuning_models/cortexx925.h: Use it.
        * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
        * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
        * config/aarch64/tuning_models/neoversen2.h: Use it.
        * config/aarch64/tuning_models/neoversen3.h: Use it.
        * config/aarch64/tuning_models/neoversev1.h: Use it.
        * config/aarch64/tuning_models/neoversev2.h: Use it.
        * config/aarch64/tuning_models/neoversev3.h: Use it.
        * config/aarch64/tuning_models/neoversev3ae.h: Use it.

---

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def

b/gcc/config/aarch64/aarch64-fusion-pairs.def

index

f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376

df9ae7a196031286d 100644

--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
  /* Baseline fusion settings suitable for all cores.  */
  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |

AARCH64_FUSE_AES_AESMC)


+/* Baseline fusion settings suitable for all Neoverse cores.  */
+#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |

AARCH64_FUSE_CMP_CSEL \

+                                   | AARCH64_FUSE_CMP_CSET)
+
  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |

AARCH64_FUSE_MOVK_MOVK)

diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h

b/gcc/config/aarch64/tuning_models/cortexx925.h

index

eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496

def277230e193d20f 100644

--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
      2 /* store_pred.  */
    }, /* memmov_cost.  */
    10, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
    "32:16",  /* function_align.  */
    "4",              /* jump_align.  */
    "32:16",  /* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h

b/gcc/config/aarch64/tuning_models/generic_armv9_a.h

index

48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1

5a5df7803ff4dc6c 100644

--- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
@@ -236,7 +236,7 @@ static const struct tune_params

generic_armv9_a_tunings =

      1 /* store_pred.  */
    }, /* memmov_cost.  */
    3, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
    "32:16",  /* function_align.  */
    "4",              /* jump_align.  */
    "32:16",  /* loop_align.  */


Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
we have to be careful about using it for generic tuning.  generic-armv9-a
mustn't become "generic Neoverse", since it's supposed to be good for
non-Neoverse (and non-Arm) Armv9-A cores as well.

So perhaps here we should expand the macro.  Alternatively, we could add
a comment saying that fusion macros for other CPUs can be added here as
well, if they are unlikely to have a negative impact on other cores.

Perhaps we should expand the macro for cortex-x925 as well.


Sorry that certainly was not the intention, I thought about naming it
AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
non-Armv9 cores I thought this would be equally confusing.  But perhaps
In light of your comments it may make more sense naming wise?


Yeah, I'd also wondered about suggesting that, but then saw that there
were quite a few Armv9-A cores that don't have the flag.

So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
should use for the Neoverse tunings.  But perhaps we should expand the
macro for the two files above.  If anyone adds any new _FUSE flags for
Neoverse cores later, there should still be an explicit decision whether
to include that flag for cortex-x925 and generic-armv9-a.

If we end up with multiple cores of the same family that want the same
flags, we can add a new macro for that family then, like you're doing
here for Neoverse.

(All IMO of course.)


Understood, works for me. I'll leave it a bit for people to comment on
otherwise will proceed with the expansion for generic.

It seems to me that as long as this is beneficial on Neoverse cores andnot harmful on the vast majority of other cores, then having it in thebase should be ok. But that should be mentioned in a comment somewhere.

R.

Cheers,
Tamar


Thanks,
Richard

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Reply via email to