Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Andrew Pinski Thu, 12 Dec 2024 09:36:32 -0800

On Wed, Dec 11, 2024 at 4:17 AM Richard Earnshaw (lists)
<richard.earns...@arm.com> wrote:
>
> On 11/12/2024 09:54, Tamar Christina wrote:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandif...@arm.com>
> >> Sent: Wednesday, December 11, 2024 9:50 AM
> >> To: Tamar Christina <tamar.christ...@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> >> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
> >> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
> >> support it
> >>
> >> Tamar Christina <tamar.christ...@arm.com> writes:
> >>>> -----Original Message-----
> >>>> From: Richard Sandiford <richard.sandif...@arm.com>
> >>>> Sent: Wednesday, December 11, 2024 9:32 AM
> >>>> To: Tamar Christina <tamar.christ...@arm.com>
> >>>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> >>>> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
> >>>> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
> >>>> support it
> >>>>
> >>>> Tamar Christina <tamar.christ...@arm.com> writes:
> >>>>> Hi All,
> >>>>>
> >>>>> GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
> >>>>>
> >>>>> This patch enables them for cores that support based on their Software
> >>>>> Optimization Guides and generically on Armv9-A.   Even if a core does 
> >>>>> not
> >>>>> support it there's no negative performance impact.
> >>>>>
> >>>>>
> >>>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >>>>>
> >>>>> Ok for master?
> >>>>>
> >>>>> Thanks,
> >>>>> Tamar
> >>>>>
> >>>>> gcc/ChangeLog:
> >>>>>
> >>>>>   * config/aarch64/aarch64-fusion-pairs.def
> >>>> (AARCH64_FUSE_NEOVERSE_BASE):
> >>>>>   New.
> >>>>>   * config/aarch64/tuning_models/cortexx925.h: Use it.
> >>>>>   * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
> >>>>>   * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
> >>>>>   * config/aarch64/tuning_models/neoversen2.h: Use it.
> >>>>>   * config/aarch64/tuning_models/neoversen3.h: Use it.
> >>>>>   * config/aarch64/tuning_models/neoversev1.h: Use it.
> >>>>>   * config/aarch64/tuning_models/neoversev2.h: Use it.
> >>>>>   * config/aarch64/tuning_models/neoversev3.h: Use it.
> >>>>>   * config/aarch64/tuning_models/neoversev3ae.h: Use it.
> >>>>>
> >>>>> ---
> >>>>>
> >>>>> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
> >>>> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> >>>>> index
> >>>>
> >> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
> >>>> df9ae7a196031286d 100644
> >>>>> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> >>>>> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> >>>>> @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
> >>>>>   /* Baseline fusion settings suitable for all cores.  */
> >>>>>   #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
> >>>> AARCH64_FUSE_AES_AESMC)
> >>>>>
> >>>>> +/* Baseline fusion settings suitable for all Neoverse cores.  */
> >>>>> +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
> >>>> AARCH64_FUSE_CMP_CSEL \
> >>>>> +                             | AARCH64_FUSE_CMP_CSET)
> >>>>> +
> >>>>>   #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
> >>>> AARCH64_FUSE_MOVK_MOVK)
> >>>>> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
> >>>> b/gcc/config/aarch64/tuning_models/cortexx925.h
> >>>>> index
> >>>>
> >> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
> >>>> def277230e193d20f 100644
> >>>>> --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> >>>>> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> >>>>> @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
> >>>>>       2 /* store_pred.  */
> >>>>>     }, /* memmov_cost.  */
> >>>>>     10, /* issue_rate  */
> >>>>> -  AARCH64_FUSE_BASE, /* fusible_ops  */
> >>>>> +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >>>>>     "32:16",      /* function_align.  */
> >>>>>     "4",          /* jump_align.  */
> >>>>>     "32:16",      /* loop_align.  */
> >>>>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >>>> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >>>>> index
> >>>>
> >> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
> >>>> 5a5df7803ff4dc6c 100644
> >>>>> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >>>>> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >>>>> @@ -236,7 +236,7 @@ static const struct tune_params
> >>>> generic_armv9_a_tunings =
> >>>>>       1 /* store_pred.  */
> >>>>>     }, /* memmov_cost.  */
> >>>>>     3, /* issue_rate  */
> >>>>> -  AARCH64_FUSE_BASE, /* fusible_ops  */
> >>>>> +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >>>>>     "32:16",      /* function_align.  */
> >>>>>     "4",          /* jump_align.  */
> >>>>>     "32:16",      /* loop_align.  */
> >>>>
> >>>> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
> >>>> we have to be careful about using it for generic tuning.  generic-armv9-a
> >>>> mustn't become "generic Neoverse", since it's supposed to be good for
> >>>> non-Neoverse (and non-Arm) Armv9-A cores as well.
> >>>>
> >>>> So perhaps here we should expand the macro.  Alternatively, we could add
> >>>> a comment saying that fusion macros for other CPUs can be added here as
> >>>> well, if they are unlikely to have a negative impact on other cores.
> >>>>
> >>>> Perhaps we should expand the macro for cortex-x925 as well.
> >>>
> >>> Sorry that certainly was not the intention, I thought about naming it
> >>> AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
> >>> non-Armv9 cores I thought this would be equally confusing.  But perhaps
> >>> In light of your comments it may make more sense naming wise?
> >>
> >> Yeah, I'd also wondered about suggesting that, but then saw that there
> >> were quite a few Armv9-A cores that don't have the flag.
> >>
> >> So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
> >> should use for the Neoverse tunings.  But perhaps we should expand the
> >> macro for the two files above.  If anyone adds any new _FUSE flags for
> >> Neoverse cores later, there should still be an explicit decision whether
> >> to include that flag for cortex-x925 and generic-armv9-a.
> >>
> >> If we end up with multiple cores of the same family that want the same
> >> flags, we can add a new macro for that family then, like you're doing
> >> here for Neoverse.
> >>
> >> (All IMO of course.)
> >
> > Understood, works for me. I'll leave it a bit for people to comment on
> > otherwise will proceed with the expansion for generic.
> >
>
> It seems to me that as long as this is beneficial on Neoverse cores and
> not harmful on the vast majority of other cores, then having it in the
> base should be ok.  But that should be mentioned in a comment somewhere.


>From a qualcomm point of view, I think enabling this won't hurt
either. So I am ok enabling this for generic-armv9-a too.

Thanks,
Andrew



>
> R.
>
> > Cheers,
> > Tamar
> >
> >>
> >> Thanks,
> >> Richard
> >
>

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Reply via email to