On Wed, Dec 11, 2024 at 4:17 AM Richard Earnshaw (lists) <richard.earns...@arm.com> wrote: > > On 11/12/2024 09:54, Tamar Christina wrote: > >> -----Original Message----- > >> From: Richard Sandiford <richard.sandif...@arm.com> > >> Sent: Wednesday, December 11, 2024 9:50 AM > >> To: Tamar Christina <tamar.christ...@arm.com> > >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > >> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org > >> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that > >> support it > >> > >> Tamar Christina <tamar.christ...@arm.com> writes: > >>>> -----Original Message----- > >>>> From: Richard Sandiford <richard.sandif...@arm.com> > >>>> Sent: Wednesday, December 11, 2024 9:32 AM > >>>> To: Tamar Christina <tamar.christ...@arm.com> > >>>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw > >>>> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org > >>>> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that > >>>> support it > >>>> > >>>> Tamar Christina <tamar.christ...@arm.com> writes: > >>>>> Hi All, > >>>>> > >>>>> GCC 15 added two new fusions CMP+CSEL and CMP+CSET. > >>>>> > >>>>> This patch enables them for cores that support based on their Software > >>>>> Optimization Guides and generically on Armv9-A. Even if a core does > >>>>> not > >>>>> support it there's no negative performance impact. > >>>>> > >>>>> > >>>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >>>>> > >>>>> Ok for master? > >>>>> > >>>>> Thanks, > >>>>> Tamar > >>>>> > >>>>> gcc/ChangeLog: > >>>>> > >>>>> * config/aarch64/aarch64-fusion-pairs.def > >>>> (AARCH64_FUSE_NEOVERSE_BASE): > >>>>> New. > >>>>> * config/aarch64/tuning_models/cortexx925.h: Use it. > >>>>> * config/aarch64/tuning_models/generic_armv9_a.h: Use it. > >>>>> * config/aarch64/tuning_models/neoverse512tvb.h: Use it. > >>>>> * config/aarch64/tuning_models/neoversen2.h: Use it. > >>>>> * config/aarch64/tuning_models/neoversen3.h: Use it. > >>>>> * config/aarch64/tuning_models/neoversev1.h: Use it. > >>>>> * config/aarch64/tuning_models/neoversev2.h: Use it. > >>>>> * config/aarch64/tuning_models/neoversev3.h: Use it. > >>>>> * config/aarch64/tuning_models/neoversev3ae.h: Use it. > >>>>> > >>>>> --- > >>>>> > >>>>> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def > >>>> b/gcc/config/aarch64/aarch64-fusion-pairs.def > >>>>> index > >>>> > >> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376 > >>>> df9ae7a196031286d 100644 > >>>>> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def > >>>>> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def > >>>>> @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) > >>>>> /* Baseline fusion settings suitable for all cores. */ > >>>>> #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | > >>>> AARCH64_FUSE_AES_AESMC) > >>>>> > >>>>> +/* Baseline fusion settings suitable for all Neoverse cores. */ > >>>>> +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | > >>>> AARCH64_FUSE_CMP_CSEL \ > >>>>> + | AARCH64_FUSE_CMP_CSET) > >>>>> + > >>>>> #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | > >>>> AARCH64_FUSE_MOVK_MOVK) > >>>>> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h > >>>> b/gcc/config/aarch64/tuning_models/cortexx925.h > >>>>> index > >>>> > >> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496 > >>>> def277230e193d20f 100644 > >>>>> --- a/gcc/config/aarch64/tuning_models/cortexx925.h > >>>>> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h > >>>>> @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = > >>>>> 2 /* store_pred. */ > >>>>> }, /* memmov_cost. */ > >>>>> 10, /* issue_rate */ > >>>>> - AARCH64_FUSE_BASE, /* fusible_ops */ > >>>>> + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ > >>>>> "32:16", /* function_align. */ > >>>>> "4", /* jump_align. */ > >>>>> "32:16", /* loop_align. */ > >>>>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >>>> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >>>>> index > >>>> > >> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1 > >>>> 5a5df7803ff4dc6c 100644 > >>>>> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >>>>> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >>>>> @@ -236,7 +236,7 @@ static const struct tune_params > >>>> generic_armv9_a_tunings = > >>>>> 1 /* store_pred. */ > >>>>> }, /* memmov_cost. */ > >>>>> 3, /* issue_rate */ > >>>>> - AARCH64_FUSE_BASE, /* fusible_ops */ > >>>>> + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ > >>>>> "32:16", /* function_align. */ > >>>>> "4", /* jump_align. */ > >>>>> "32:16", /* loop_align. */ > >>>> > >>>> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think > >>>> we have to be careful about using it for generic tuning. generic-armv9-a > >>>> mustn't become "generic Neoverse", since it's supposed to be good for > >>>> non-Neoverse (and non-Arm) Armv9-A cores as well. > >>>> > >>>> So perhaps here we should expand the macro. Alternatively, we could add > >>>> a comment saying that fusion macros for other CPUs can be added here as > >>>> well, if they are unlikely to have a negative impact on other cores. > >>>> > >>>> Perhaps we should expand the macro for cortex-x925 as well. > >>> > >>> Sorry that certainly was not the intention, I thought about naming it > >>> AARCH64_FUSE_ARMV9_BASE instead, but since I was also using it for > >>> non-Armv9 cores I thought this would be equally confusing. But perhaps > >>> In light of your comments it may make more sense naming wise? > >> > >> Yeah, I'd also wondered about suggesting that, but then saw that there > >> were quite a few Armv9-A cores that don't have the flag. > >> > >> So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we > >> should use for the Neoverse tunings. But perhaps we should expand the > >> macro for the two files above. If anyone adds any new _FUSE flags for > >> Neoverse cores later, there should still be an explicit decision whether > >> to include that flag for cortex-x925 and generic-armv9-a. > >> > >> If we end up with multiple cores of the same family that want the same > >> flags, we can add a new macro for that family then, like you're doing > >> here for Neoverse. > >> > >> (All IMO of course.) > > > > Understood, works for me. I'll leave it a bit for people to comment on > > otherwise will proceed with the expansion for generic. > > > > It seems to me that as long as this is beneficial on Neoverse cores and > not harmful on the vast majority of other cores, then having it in the > base should be ok. But that should be mentioned in a comment somewhere.
>From a qualcomm point of view, I think enabling this won't hurt either. So I am ok enabling this for generic-armv9-a too. Thanks, Andrew > > R. > > > Cheers, > > Tamar > > > >> > >> Thanks, > >> Richard > > >