Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Richard Sandiford Wed, 11 Dec 2024 01:50:25 -0800

Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Wednesday, December 11, 2024 9:32 AM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
>> support it
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
>> >
>> > This patch enables them for cores that support based on their Software
>> > Optimization Guides and generically on Armv9-A.   Even if a core does not
>> > support it there's no negative performance impact.
>> >
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >    * config/aarch64/aarch64-fusion-pairs.def
>> (AARCH64_FUSE_NEOVERSE_BASE):
>> >    New.
>> >    * config/aarch64/tuning_models/cortexx925.h: Use it.
>> >    * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
>> >    * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
>> >    * config/aarch64/tuning_models/neoversen2.h: Use it.
>> >    * config/aarch64/tuning_models/neoversen3.h: Use it.
>> >    * config/aarch64/tuning_models/neoversev1.h: Use it.
>> >    * config/aarch64/tuning_models/neoversev2.h: Use it.
>> >    * config/aarch64/tuning_models/neoversev3.h: Use it.
>> >    * config/aarch64/tuning_models/neoversev3ae.h: Use it.
>> >
>> > ---
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
>> b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> > index
>> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
>> df9ae7a196031286d 100644
>> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
>> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>> >  /* Baseline fusion settings suitable for all cores.  */
>> >  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
>> AARCH64_FUSE_AES_AESMC)
>> >
>> > +/* Baseline fusion settings suitable for all Neoverse cores.  */
>> > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
>> AARCH64_FUSE_CMP_CSEL \
>> > +                              | AARCH64_FUSE_CMP_CSET)
>> > +
>> >  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
>> AARCH64_FUSE_MOVK_MOVK)
>> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
>> b/gcc/config/aarch64/tuning_models/cortexx925.h
>> > index
>> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
>> def277230e193d20f 100644
>> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h
>> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
>> > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
>> >      2 /* store_pred.  */
>> >    }, /* memmov_cost.  */
>> >    10, /* issue_rate  */
>> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
>> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>> >    "32:16",        /* function_align.  */
>> >    "4",            /* jump_align.  */
>> >    "32:16",        /* loop_align.  */
>> > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> > index
>> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
>> 5a5df7803ff4dc6c 100644
>> > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> > @@ -236,7 +236,7 @@ static const struct tune_params
>> generic_armv9_a_tunings =
>> >      1 /* store_pred.  */
>> >    }, /* memmov_cost.  */
>> >    3, /* issue_rate  */
>> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
>> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>> >    "32:16",        /* function_align.  */
>> >    "4",            /* jump_align.  */
>> >    "32:16",        /* loop_align.  */
>> 
>> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
>> we have to be careful about using it for generic tuning.  generic-armv9-a
>> mustn't become "generic Neoverse", since it's supposed to be good for
>> non-Neoverse (and non-Arm) Armv9-A cores as well.
>> 
>> So perhaps here we should expand the macro.  Alternatively, we could add
>> a comment saying that fusion macros for other CPUs can be added here as
>> well, if they are unlikely to have a negative impact on other cores.
>> 
>> Perhaps we should expand the macro for cortex-x925 as well.
>
> Sorry that certainly was not the intention, I thought about naming it
> AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
> non-Armv9 cores I thought this would be equally confusing.  But perhaps
> In light of your comments it may make more sense naming wise?


Yeah, I'd also wondered about suggesting that, but then saw that there
were quite a few Armv9-A cores that don't have the flag.

So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
should use for the Neoverse tunings.  But perhaps we should expand the
macro for the two files above.  If anyone adds any new _FUSE flags for
Neoverse cores later, there should still be an explicit decision whether
to include that flag for cortex-x925 and generic-armv9-a.

If we end up with multiple cores of the same family that want the same
flags, we can add a new macro for that family then, like you're doing
here for Neoverse.

(All IMO of course.)

Thanks,
Richard

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Reply via email to