Thanks for doing this Tamar, > On 11 Dec 2024, at 10:54, Tamar Christina <tamar.christ...@arm.com> wrote: > >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Wednesday, December 11, 2024 9:50 AM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw >> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org >> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that >> support it >> >> Tamar Christina <tamar.christ...@arm.com> writes: >>>> -----Original Message----- >>>> From: Richard Sandiford <richard.sandif...@arm.com> >>>> Sent: Wednesday, December 11, 2024 9:32 AM >>>> To: Tamar Christina <tamar.christ...@arm.com> >>>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw >>>> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org >>>> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that >>>> support it >>>> >>>> Tamar Christina <tamar.christ...@arm.com> writes: >>>>> Hi All, >>>>> >>>>> GCC 15 added two new fusions CMP+CSEL and CMP+CSET. >>>>> >>>>> This patch enables them for cores that support based on their Software >>>>> Optimization Guides and generically on Armv9-A. Even if a core does not >>>>> support it there's no negative performance impact. >>>>> >>>>> >>>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >>>>> >>>>> Ok for master? >>>>> >>>>> Thanks, >>>>> Tamar >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> * config/aarch64/aarch64-fusion-pairs.def >>>> (AARCH64_FUSE_NEOVERSE_BASE): >>>>> New. >>>>> * config/aarch64/tuning_models/cortexx925.h: Use it. >>>>> * config/aarch64/tuning_models/generic_armv9_a.h: Use it. >>>>> * config/aarch64/tuning_models/neoverse512tvb.h: Use it. >>>>> * config/aarch64/tuning_models/neoversen2.h: Use it. >>>>> * config/aarch64/tuning_models/neoversen3.h: Use it. >>>>> * config/aarch64/tuning_models/neoversev1.h: Use it. >>>>> * config/aarch64/tuning_models/neoversev2.h: Use it. >>>>> * config/aarch64/tuning_models/neoversev3.h: Use it. >>>>> * config/aarch64/tuning_models/neoversev3ae.h: Use it. >>>>> >>>>> --- >>>>> >>>>> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def >>>> b/gcc/config/aarch64/aarch64-fusion-pairs.def >>>>> index >>>> >> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376 >>>> df9ae7a196031286d 100644 >>>>> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def >>>>> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def >>>>> @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) >>>>> /* Baseline fusion settings suitable for all cores. */ >>>>> #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | >>>> AARCH64_FUSE_AES_AESMC) >>>>> >>>>> +/* Baseline fusion settings suitable for all Neoverse cores. */ >>>>> +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | >>>> AARCH64_FUSE_CMP_CSEL \ >>>>> + | AARCH64_FUSE_CMP_CSET) >>>>> + >>>>> #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | >>>> AARCH64_FUSE_MOVK_MOVK) >>>>> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h >>>> b/gcc/config/aarch64/tuning_models/cortexx925.h >>>>> index >>>> >> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496 >>>> def277230e193d20f 100644 >>>>> --- a/gcc/config/aarch64/tuning_models/cortexx925.h >>>>> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h >>>>> @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = >>>>> 2 /* store_pred. */ >>>>> }, /* memmov_cost. */ >>>>> 10, /* issue_rate */ >>>>> - AARCH64_FUSE_BASE, /* fusible_ops */ >>>>> + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ >>>>> "32:16", /* function_align. */ >>>>> "4", /* jump_align. */ >>>>> "32:16", /* loop_align. */ >>>>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h >>>> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h >>>>> index >>>> >> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1 >>>> 5a5df7803ff4dc6c 100644 >>>>> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h >>>>> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h >>>>> @@ -236,7 +236,7 @@ static const struct tune_params >>>> generic_armv9_a_tunings = >>>>> 1 /* store_pred. */ >>>>> }, /* memmov_cost. */ >>>>> 3, /* issue_rate */ >>>>> - AARCH64_FUSE_BASE, /* fusible_ops */ >>>>> + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ >>>>> "32:16", /* function_align. */ >>>>> "4", /* jump_align. */ >>>>> "32:16", /* loop_align. */ >>>> >>>> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think >>>> we have to be careful about using it for generic tuning. generic-armv9-a >>>> mustn't become "generic Neoverse", since it's supposed to be good for >>>> non-Neoverse (and non-Arm) Armv9-A cores as well. >>>> >>>> So perhaps here we should expand the macro. Alternatively, we could add >>>> a comment saying that fusion macros for other CPUs can be added here as >>>> well, if they are unlikely to have a negative impact on other cores. >>>> >>>> Perhaps we should expand the macro for cortex-x925 as well. >>> >>> Sorry that certainly was not the intention, I thought about naming it >>> AARCH64_FUSE_ARMV9_BASE instead, but since I was also using it for >>> non-Armv9 cores I thought this would be equally confusing. But perhaps >>> In light of your comments it may make more sense naming wise? >> >> Yeah, I'd also wondered about suggesting that, but then saw that there >> were quite a few Armv9-A cores that don't have the flag. >> >> So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we >> should use for the Neoverse tunings. But perhaps we should expand the >> macro for the two files above. If anyone adds any new _FUSE flags for >> Neoverse cores later, there should still be an explicit decision whether >> to include that flag for cortex-x925 and generic-armv9-a. >> >> If we end up with multiple cores of the same family that want the same >> flags, we can add a new macro for that family then, like you're doing >> here for Neoverse. >> >> (All IMO of course.) > > Understood, works for me. I'll leave it a bit for people to comment on > otherwise will proceed with the expansion for generic.
I’m happy to group these under a common flag, but I think that NEOVERSE in the name makes it unnecessary specific. But I also can’t come up with a nice-sounding name to suggest instead (these seem to be “big core” flags) and since this is just an internal identifier rather than a user-facing name I’m not too fussed about it. Thanks, Kyrill > > Cheers, > Tamar > >> >> Thanks, >> Richard >