Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Kyrylo Tkachov Thu, 12 Dec 2024 07:12:34 -0800

Thanks for doing this Tamar,

> On 11 Dec 2024, at 10:54, Tamar Christina <tamar.christ...@arm.com> wrote:
> 
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Wednesday, December 11, 2024 9:50 AM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
>> support it
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>>>> -----Original Message-----
>>>> From: Richard Sandiford <richard.sandif...@arm.com>
>>>> Sent: Wednesday, December 11, 2024 9:32 AM
>>>> To: Tamar Christina <tamar.christ...@arm.com>
>>>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Earnshaw
>>>> <richard.earns...@arm.com>; ktkac...@gcc.gnu.org
>>>> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
>>>> support it
>>>> 
>>>> Tamar Christina <tamar.christ...@arm.com> writes:
>>>>> Hi All,
>>>>> 
>>>>> GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
>>>>> 
>>>>> This patch enables them for cores that support based on their Software
>>>>> Optimization Guides and generically on Armv9-A.   Even if a core does not
>>>>> support it there's no negative performance impact.
>>>>> 
>>>>> 
>>>>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>>>> 
>>>>> Ok for master?
>>>>> 
>>>>> Thanks,
>>>>> Tamar
>>>>> 
>>>>> gcc/ChangeLog:
>>>>> 
>>>>> * config/aarch64/aarch64-fusion-pairs.def
>>>> (AARCH64_FUSE_NEOVERSE_BASE):
>>>>> New.
>>>>> * config/aarch64/tuning_models/cortexx925.h: Use it.
>>>>> * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
>>>>> * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
>>>>> * config/aarch64/tuning_models/neoversen2.h: Use it.
>>>>> * config/aarch64/tuning_models/neoversen3.h: Use it.
>>>>> * config/aarch64/tuning_models/neoversev1.h: Use it.
>>>>> * config/aarch64/tuning_models/neoversev2.h: Use it.
>>>>> * config/aarch64/tuning_models/neoversev3.h: Use it.
>>>>> * config/aarch64/tuning_models/neoversev3ae.h: Use it.
>>>>> 
>>>>> ---
>>>>> 
>>>>> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
>>>> b/gcc/config/aarch64/aarch64-fusion-pairs.def
>>>>> index
>>>> 
>> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
>>>> df9ae7a196031286d 100644
>>>>> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
>>>>> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
>>>>> @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>>>>> /* Baseline fusion settings suitable for all cores.  */
>>>>> #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
>>>> AARCH64_FUSE_AES_AESMC)
>>>>> 
>>>>> +/* Baseline fusion settings suitable for all Neoverse cores.  */
>>>>> +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
>>>> AARCH64_FUSE_CMP_CSEL \
>>>>> +    | AARCH64_FUSE_CMP_CSET)
>>>>> +
>>>>> #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
>>>> AARCH64_FUSE_MOVK_MOVK)
>>>>> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
>>>> b/gcc/config/aarch64/tuning_models/cortexx925.h
>>>>> index
>>>> 
>> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
>>>> def277230e193d20f 100644
>>>>> --- a/gcc/config/aarch64/tuning_models/cortexx925.h
>>>>> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
>>>>> @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
>>>>>     2 /* store_pred.  */
>>>>>   }, /* memmov_cost.  */
>>>>>   10, /* issue_rate  */
>>>>> -  AARCH64_FUSE_BASE, /* fusible_ops  */
>>>>> +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>>>>>   "32:16", /* function_align.  */
>>>>>   "4", /* jump_align.  */
>>>>>   "32:16", /* loop_align.  */
>>>>> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>>>> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>>>>> index
>>>> 
>> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
>>>> 5a5df7803ff4dc6c 100644
>>>>> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>>>>> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>>>>> @@ -236,7 +236,7 @@ static const struct tune_params
>>>> generic_armv9_a_tunings =
>>>>>     1 /* store_pred.  */
>>>>>   }, /* memmov_cost.  */
>>>>>   3, /* issue_rate  */
>>>>> -  AARCH64_FUSE_BASE, /* fusible_ops  */
>>>>> +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>>>>>   "32:16", /* function_align.  */
>>>>>   "4", /* jump_align.  */
>>>>>   "32:16", /* loop_align.  */
>>>> 
>>>> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
>>>> we have to be careful about using it for generic tuning.  generic-armv9-a
>>>> mustn't become "generic Neoverse", since it's supposed to be good for
>>>> non-Neoverse (and non-Arm) Armv9-A cores as well.
>>>> 
>>>> So perhaps here we should expand the macro.  Alternatively, we could add
>>>> a comment saying that fusion macros for other CPUs can be added here as
>>>> well, if they are unlikely to have a negative impact on other cores.
>>>> 
>>>> Perhaps we should expand the macro for cortex-x925 as well.
>>> 
>>> Sorry that certainly was not the intention, I thought about naming it
>>> AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
>>> non-Armv9 cores I thought this would be equally confusing.  But perhaps
>>> In light of your comments it may make more sense naming wise?
>> 
>> Yeah, I'd also wondered about suggesting that, but then saw that there
>> were quite a few Armv9-A cores that don't have the flag.
>> 
>> So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
>> should use for the Neoverse tunings.  But perhaps we should expand the
>> macro for the two files above.  If anyone adds any new _FUSE flags for
>> Neoverse cores later, there should still be an explicit decision whether
>> to include that flag for cortex-x925 and generic-armv9-a.
>> 
>> If we end up with multiple cores of the same family that want the same
>> flags, we can add a new macro for that family then, like you're doing
>> here for Neoverse.
>> 
>> (All IMO of course.)
> 
> Understood, works for me. I'll leave it a bit for people to comment on
> otherwise will proceed with the expansion for generic.


I’m happy to group these under a common flag, but I think that NEOVERSE in the 
name makes it unnecessary specific.
But I also can’t come up with a nice-sounding name to suggest instead (these 
seem to be “big core” flags) and since this is just an internal identifier 
rather than a user-facing name I’m not too fussed about it.

Thanks,
Kyrill

> 
> Cheers,
> Tamar
> 
>> 
>> Thanks,
>> Richard
>

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

Reply via email to