Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

Kyrylo Tkachov Mon, 07 Apr 2025 05:25:03 -0700


> On 7 Apr 2025, at 10:21, Tamar Christina <tamar.christ...@arm.com> wrote:
> 
>> -----Original Message-----
>> From: Kyrylo Tkachov <ktkac...@nvidia.com>
>> Sent: Monday, March 31, 2025 1:43 PM
>> To: i...@sandoe.co.uk
>> Cc: Tamar Christina <tamar.christ...@arm.com>; GCC Patches <gcc-
>> patc...@gcc.gnu.org>; Alice Carlotti <alice.carlo...@arm.com>; Richard 
>> Sandiford
>> <richard.sandif...@arm.com>; s...@gentoo.org
>> Subject: Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple 
>> cores
>> [PR113257].
>> 
>> Hi Iain,
>> 
>>> On 22 Mar 2025, at 15:31, Iain Sandoe <iains....@gmail.com> wrote:
>>> 
>>> 0. Sorry this has taken some time to close off; partly because of waiting
>>>  for input, but mostly that I've been stretched with other work.
>>> 1. As per the commit message, the apparent non-conformance with 8.5/6
>>>  because FEAT_SPECRES returns 0, is a result of the query operating
>>>  at user priv.  The cores are confirmed to support this for priv.
>>>  code.
>>> 2. I added entries for the apple-m1,2,3 cores in invoke.texi.
>>> 3. Following Andrew's suggestion and with some measurements by Tamar
>>>  and me, figured out the LITTLE.big chip ids (at least for a sub-
>>>  set).
>>> 
>>> This has been in use for a while on aarch64-darwin branches and I've
>>> checked manually that it gives the right .arch lines on cfarm185.
>>> 
>>> OK for trunk? (if so, when?)
>>> thanks
>>> Iain
>>> 
>>> --- 8< ---
>>> 
>>> After discussion with the open source support team at Apple, we have
>>> established that the cores conform to the 8.5 and 8.6 requirements.
>>> One of the mandatory features (FEAT_SPECRES) is not exposed (or
>>> available) in user-space code but is supported for privileged code.
>>> 
>>> The values for chip IDs and the LITTLE.big variants have been taken
>>> from lists in the XNU and LLVM sources.
>>> 
>>> PR target/113257
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Apple-a12,
>>> Apple-M1, Apple-M2, Apple-M3 with expanded names to allow for the
>>> LITTLE.big versions.
>>> * config/aarch64/aarch64-tune.md: Regenerate.
>>> * doc/invoke.texi: Add apple-m1,2 and 3 cores to the ones listed
>>> for arch and tune selections.
>>> 
>>> Signed-off-by: Iain Sandoe <i...@sandoe.co.uk>
>>> ---
>>> gcc/config/aarch64/aarch64-cores.def | 16 ++++++++++++++++
>>> gcc/config/aarch64/aarch64-tune.md   |  2 +-
>>> gcc/doc/invoke.texi                  |  5 +++--
>>> 3 files changed, 20 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-cores.def
>> b/gcc/config/aarch64/aarch64-cores.def
>>> index 0e22d72976e..7f204fd0ac9 100644
>>> --- a/gcc/config/aarch64/aarch64-cores.def
>>> +++ b/gcc/config/aarch64/aarch64-cores.def
>>> @@ -173,6 +173,22 @@ AARCH64_CORE("cortex-a76.cortex-a55",
>> cortexa76cortexa55, cortexa53, V8_2A,  (F
>>> AARCH64_CORE("cortex-r82", cortexr82, cortexa53, V8R, (), cortexa53, 0x41,
>> 0xd15, -1)
>>> AARCH64_CORE("cortex-r82ae", cortexr82ae, cortexa53, V8R, (), cortexa53,
>> 0x41, 0xd14, -1)
>>> 
>>> +/* Apple (A12 and M) cores.
>>> +   Known part numbers as listed in other public sources.
>>> +   Placeholders for schedulers, generic_armv8_a for costs.
>>> +   A12 seems mostly 8.3, M1 is 8.5 without BTI, M2 and M3 are 8.6
>>> +   From measurements made so far the odd-number core IDs are performance.
>> */
>>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A,  (),
>> generic_armv8_a, 0x61, 0x12, -1)
>>> +AARCH64_CORE("apple-m1", applem1_0, cortexa57, V8_5A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x21, 0x20), -1)
>>> +AARCH64_CORE("apple-m1", applem1_1, cortexa57, V8_5A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x23, 0x22), -1)
>>> +AARCH64_CORE("apple-m1", applem1_2, cortexa57, V8_5A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x25, 0x24), -1)
>>> +AARCH64_CORE("apple-m1", applem1_3, cortexa57, V8_5A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x29, 0x28), -1)
>>> +AARCH64_CORE("apple-m2", applem2_0, cortexa57, V8_6A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x31, 0x30), -1)
>>> +AARCH64_CORE("apple-m2", applem2_1, cortexa57, V8_6A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x33, 0x32), -1)
>>> +AARCH64_CORE("apple-m2", applem2_2, cortexa57, V8_6A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x35, 0x34), -1)
>>> +AARCH64_CORE("apple-m2", applem2_3, cortexa57, V8_6A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x39, 0x38), -1)
>>> +AARCH64_CORE("apple-m3", applem3_0, cortexa57, V8_6A,  (),
>> generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x49, 0x48), -1)
>> 
>> I don’t think we have precedent of different MIDR part numbers resolving to 
>> the
>> same -mcpu string, but I think it should all work as expected.
> 
> Indeed, I think for the current usage it should work fine.
> 
>> As long as you and Tamar are happy with the feature set here no objections 
>> from
>> me.
> 
> FWIW no objections from me.  This should unblock folks 😊
> 
> Thanks,
> Tamar
> 
>> Looks ok to me for GCC 15 with a documentation comment below…
>> 
>>> +
>>> /* Armv9.0-A Architecture Processors.  */
>>> 
>>> /* Arm ('A') cores. */
>>> diff --git a/gcc/config/aarch64/aarch64-tune.md
>> b/gcc/config/aarch64/aarch64-tune.md
>>> index 56a914f12b9..982074c2c21 100644
>>> --- a/gcc/config/aarch64/aarch64-tune.md
>>> +++ b/gcc/config/aarch64/aarch64-tune.md
>>> @@ -1,5 +1,5 @@
>>> ;; -*- buffer-read-only: t -*-
>>> ;; Generated automatically by gentune.sh from aarch64-cores.def
>>> (define_attr "tune"
>>> -
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunder
>> xt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt8
>> 3,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
>> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae
>> ,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cor
>> texx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeo
>> ntx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv
>> 110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57c
>> ortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75c
>> ortexa55,cortexa76cortexa55,cortexr82,cortexr82ae,cortexa510,cortexa520,corte
>> xa520ae,cortexa710,cortexa715,cortexa720,cortexa720ae,cortexa725,cortexx2,c
>> ortexx3,cortexx4,cortexx925,neoversen2,cobalt100,neoversen3,neoversev2,grace
>> ,neoversev3,neoversev3ae,demeter,olympus,generic,generic_armv8_a,generic_ar
>> mv9_a"
>>> +
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunder
>> xt88,thunderxt88p1,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt8
>> 3,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,t
>> hunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae
>> ,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cor
>> texx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeo
>> ntx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,fujitsu_monaka,tsv
>> 110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57c
>> ortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75c
>> ortexa55,cortexa76cortexa55,cortexr82,cortexr82ae,applea12,applem1_0,apple
>> m1_1,applem1_2,applem1_3,applem2_0,applem2_1,applem2_2,applem2_3,app
>> lem3_0,cortexa510,cortexa520,cortexa520ae,cortexa710,cortexa715,cortexa720,
>> cortexa720ae,cortexa725,cortexx2,cortexx3,cortexx4,cortexx925,neoversen2,cob
>> alt100,neoversen3,neoversev2,grace,neoversev3,neoversev3ae,demeter,olympus
>> ,generic,generic_armv8_a,generic_armv9_a"
>>> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 515d91ac2e3..f8f712d1877 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -21763,7 +21763,8 @@ performance of the code.  Permissible values for
>> this option are:
>>> @samp{cortex-x2}, @samp{cortex-x3}, @samp{cortex-x4}, @samp{cortex-
>> a510},
>>> @samp{cortex-a520}, @samp{cortex-a520ae}, @samp{cortex-a710},
>> @samp{cortex-a715},
>>> @samp{cortex-a720}, @samp{cortex-a720ae}, @samp{ampere1},
>> @samp{ampere1a},
>>> -@samp{ampere1b}, @samp{cobalt-100} and @samp{native}.
>>> +@samp{ampere1b}, @samp{cobalt-100}, @samp{apple-m1}, @samp{apple-
>> m2},
>>> +@samp{apple-m3} and @samp{native}.
>>> 
>>> The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
>>> @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},
>>> @@ -23842,7 +23843,7 @@ Permissible names are: @samp{arm7tdmi},
>> @samp{arm7tdmi-s}, @samp{arm710t},
>>> @samp{neoverse-n1}, @samp{neoverse-n2}, @samp{neoverse-v1},
>> @samp{xscale},
>>> @samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}, @samp{fa526},
>> @samp{fa626},
>>> @samp{fa606te}, @samp{fa626te}, @samp{fmp626}, @samp{fa726te},
>> @samp{star-mc1},
>>> -@samp{xgene1}.
>>> +@samp{xgene1}, @samp{apple-m1}, @samp{apple-m2}, @samp{apple-m3}.
>> 
>> This looks like the section for (32-bit) arm rather than aarch64.
>> 
>> 
>>> 
>>> Additionally, this option can specify that GCC should tune the performance
>>> of the code for a big.LITTLE system.  Permissible names are:
>> 
>> There is a similar section about big.LITTLE in the aarch64 section where the 
>> b.L
>> options you add in the patch should be listed.


Ok, then ok for trunk with the documentation point fixed.
Thanks,
Kyrill


>> 
>>> --
>>> 2.39.2 (Apple Git-143)
>>> 
>

Re: [PATCH v2] aarch64, Darwin: Initial implementation of Apple cores [PR113257].

Reply via email to