On 11/12/2024 09:54, Tamar Christina wrote:
-----Original Message-----
From: Richard Sandiford <richard.sandif...@arm.com>
Sent: Wednesday, December 11, 2024 9:50 AM
To: Tamar Christina <tamar.christ...@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
<richard.earns...@arm.com>; ktkac...@gcc.gnu.org
Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
support it
Tamar Christina <tamar.christ...@arm.com> writes:
-----Original Message-----
From: Richard Sandiford <richard.sandif...@arm.com>
Sent: Wednesday, December 11, 2024 9:32 AM
To: Tamar Christina <tamar.christ...@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
<richard.earns...@arm.com>; ktkac...@gcc.gnu.org
Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
support it
Tamar Christina <tamar.christ...@arm.com> writes:
Hi All,
GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
This patch enables them for cores that support based on their Software
Optimization Guides and generically on Armv9-A. Even if a core does not
support it there's no negative performance impact.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* config/aarch64/aarch64-fusion-pairs.def
(AARCH64_FUSE_NEOVERSE_BASE):
New.
* config/aarch64/tuning_models/cortexx925.h: Use it.
* config/aarch64/tuning_models/generic_armv9_a.h: Use it.
* config/aarch64/tuning_models/neoverse512tvb.h: Use it.
* config/aarch64/tuning_models/neoversen2.h: Use it.
* config/aarch64/tuning_models/neoversen3.h: Use it.
* config/aarch64/tuning_models/neoversev1.h: Use it.
* config/aarch64/tuning_models/neoversev2.h: Use it.
* config/aarch64/tuning_models/neoversev3.h: Use it.
* config/aarch64/tuning_models/neoversev3ae.h: Use it.
---
diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index
f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
df9ae7a196031286d 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
/* Baseline fusion settings suitable for all cores. */
#define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
AARCH64_FUSE_AES_AESMC)
+/* Baseline fusion settings suitable for all Neoverse cores. */
+#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
AARCH64_FUSE_CMP_CSEL \
+ | AARCH64_FUSE_CMP_CSET)
+
#define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
AARCH64_FUSE_MOVK_MOVK)
diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
b/gcc/config/aarch64/tuning_models/cortexx925.h
index
eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
def277230e193d20f 100644
--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
2 /* store_pred. */
}, /* memmov_cost. */
10, /* issue_rate */
- AARCH64_FUSE_BASE, /* fusible_ops */
+ AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */
"32:16", /* function_align. */
"4", /* jump_align. */
"32:16", /* loop_align. */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
index
48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
5a5df7803ff4dc6c 100644
--- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
@@ -236,7 +236,7 @@ static const struct tune_params
generic_armv9_a_tunings =
1 /* store_pred. */
}, /* memmov_cost. */
3, /* issue_rate */
- AARCH64_FUSE_BASE, /* fusible_ops */
+ AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */
"32:16", /* function_align. */
"4", /* jump_align. */
"32:16", /* loop_align. */
Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
we have to be careful about using it for generic tuning. generic-armv9-a
mustn't become "generic Neoverse", since it's supposed to be good for
non-Neoverse (and non-Arm) Armv9-A cores as well.
So perhaps here we should expand the macro. Alternatively, we could add
a comment saying that fusion macros for other CPUs can be added here as
well, if they are unlikely to have a negative impact on other cores.
Perhaps we should expand the macro for cortex-x925 as well.
Sorry that certainly was not the intention, I thought about naming it
AARCH64_FUSE_ARMV9_BASE instead, but since I was also using it for
non-Armv9 cores I thought this would be equally confusing. But perhaps
In light of your comments it may make more sense naming wise?
Yeah, I'd also wondered about suggesting that, but then saw that there
were quite a few Armv9-A cores that don't have the flag.
So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
should use for the Neoverse tunings. But perhaps we should expand the
macro for the two files above. If anyone adds any new _FUSE flags for
Neoverse cores later, there should still be an explicit decision whether
to include that flag for cortex-x925 and generic-armv9-a.
If we end up with multiple cores of the same family that want the same
flags, we can add a new macro for that family then, like you're doing
here for Neoverse.
(All IMO of course.)
Understood, works for me. I'll leave it a bit for people to comment on
otherwise will proceed with the expansion for generic.
It seems to me that as long as this is beneficial on Neoverse cores and
not harmful on the vast majority of other cores, then having it in the
base should be ok. But that should be mentioned in a comment somewhere.
R.
Cheers,
Tamar
Thanks,
Richard