RE: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Tamar Christina Wed, 11 Dec 2024 01:55:10 -0800

> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Wednesday, December 11, 2024 9:50 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
> support it
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandif...@arm.com>
> >> Sent: Wednesday, December 11, 2024 9:32 AM
> >> To: Tamar Christina <tamar.christ...@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
> >> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
> >> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
> >> support it
> >>
> >> Tamar Christina <tamar.christ...@arm.com> writes:
> >> > Hi All,
> >> >
> >> > GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
> >> >
> >> > This patch enables them for cores that support based on their Software
> >> > Optimization Guides and generically on Armv9-A.   Even if a core does not
> >> > support it there's no negative performance impact.
> >> >
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * config/aarch64/aarch64-fusion-pairs.def
> >> (AARCH64_FUSE_NEOVERSE_BASE):
> >> >  New.
> >> >  * config/aarch64/tuning_models/cortexx925.h: Use it.
> >> >  * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
> >> >  * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversen2.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversen3.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev1.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev2.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev3.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev3ae.h: Use it.
> >> >
> >> > ---
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> > index
> >>
> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
> >> df9ae7a196031286d 100644
> >> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
> >> >  /* Baseline fusion settings suitable for all cores.  */
> >> >  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
> >> AARCH64_FUSE_AES_AESMC)
> >> >
> >> > +/* Baseline fusion settings suitable for all Neoverse cores.  */
> >> > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
> >> AARCH64_FUSE_CMP_CSEL \
> >> > +                                    | AARCH64_FUSE_CMP_CSET)
> >> > +
> >> >  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
> >> AARCH64_FUSE_MOVK_MOVK)
> >> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
> >> b/gcc/config/aarch64/tuning_models/cortexx925.h
> >> > index
> >>
> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
> >> def277230e193d20f 100644
> >> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> >> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> >> > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
> >> >      2 /* store_pred.  */
> >> >    }, /* memmov_cost.  */
> >> >    10, /* issue_rate  */
> >> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
> >> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >> >    "32:16",      /* function_align.  */
> >> >    "4",          /* jump_align.  */
> >> >    "32:16",      /* loop_align.  */
> >> > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> > index
> >>
> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
> >> 5a5df7803ff4dc6c 100644
> >> > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> > @@ -236,7 +236,7 @@ static const struct tune_params
> >> generic_armv9_a_tunings =
> >> >      1 /* store_pred.  */
> >> >    }, /* memmov_cost.  */
> >> >    3, /* issue_rate  */
> >> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
> >> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >> >    "32:16",      /* function_align.  */
> >> >    "4",          /* jump_align.  */
> >> >    "32:16",      /* loop_align.  */
> >>
> >> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
> >> we have to be careful about using it for generic tuning.  generic-armv9-a
> >> mustn't become "generic Neoverse", since it's supposed to be good for
> >> non-Neoverse (and non-Arm) Armv9-A cores as well.
> >>
> >> So perhaps here we should expand the macro.  Alternatively, we could add
> >> a comment saying that fusion macros for other CPUs can be added here as
> >> well, if they are unlikely to have a negative impact on other cores.
> >>
> >> Perhaps we should expand the macro for cortex-x925 as well.
> >
> > Sorry that certainly was not the intention, I thought about naming it
> > AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
> > non-Armv9 cores I thought this would be equally confusing.  But perhaps
> > In light of your comments it may make more sense naming wise?
> 
> Yeah, I'd also wondered about suggesting that, but then saw that there
> were quite a few Armv9-A cores that don't have the flag.
> 
> So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
> should use for the Neoverse tunings.  But perhaps we should expand the
> macro for the two files above.  If anyone adds any new _FUSE flags for
> Neoverse cores later, there should still be an explicit decision whether
> to include that flag for cortex-x925 and generic-armv9-a.
> 
> If we end up with multiple cores of the same family that want the same
> flags, we can add a new macro for that family then, like you're doing
> here for Neoverse.
> 
> (All IMO of course.)


Understood, works for me. I'll leave it a bit for people to comment on
otherwise will proceed with the expansion for generic.

Cheers,
Tamar

> 
> Thanks,
> Richard

RE: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Reply via email to