On Mon, Jan 20, 2025 at 06:29:12PM +0000, Tamar Christina wrote:
> > -----Original Message-----
> > From: Iain Sandoe <i...@sandoe.co.uk>
> > Sent: Monday, January 20, 2025 6:15 PM
> > To: Andrew Carlotti <andrew.carlo...@arm.com>
> > Cc: Kyrylo Tkachov <ktkac...@nvidia.com>; GCC Patches <gcc-
> > patc...@gcc.gnu.org>; Tamar Christina <tamar.christ...@arm.com>; Richard
> > Sandiford <richard.sandif...@arm.com>; Sam James <s...@gentoo.org>
> > Subject: Re: [PATCH] aarch64: Provide initial specifications for Apple CPU 
> > cores.
> >
> >
> >
> > > On 20 Jan 2025, at 17:38, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
> > >
> > > On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote:
> > >>
> >
> > >> Please note that the Darwin assembler is Apple’s LLVM backend (invoked 
> > >> via
> > clang -cc1as)
> > >> and that means that whatever GCC deduces for the feature set and outputs 
> > >> into
> > the asm
> > >> in the .arch line has to mean the same to LLVM as it does to GCC.
> > >>
> > >> For example:
> > >> I ran into a problem (that might or might not still exist) where 
> > >> specifying crc on
> > top of a
> > >> 8.4 spec dropped the base rev assumed back to 8.x where crc was 
> > >> introduced
> > (that could
> > >> be just a bug, but I have to live with it).
> > >
> > > I remembered reading about this issue before; after some searching I 
> > > discoved
> > > that it was via another email from you in 2023:
> > > https://gcc.gnu.org/pipermail/gcc/2023-October/242748.html
> > >
> > > So I did some more digging.  I believe the CRC workaround was added to 
> > > handle
> > > misspecified CPUs in Binutils.  In particular:
> > >
> > > - +crc only existed from Feb 2013 (e60bb1dd3)
> >
> > This is a “solved problem” on the Darwni development branch ( I made a 
> > configure
> > check to back out of the fix if it isn’t needed) … but see below…
> >
> > >>> How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the 
> > >>> part
> > numbers?
> > >>> I see it has different values for the M1,M2,M3 ones that you have in 
> > >>> your
> > patch.
> > >>
> > >> Looking at a recent version of that I see the host-side values that we 
> > >> use when
> > >> doing the native query.  These are obtained from the OS - not the chip 
> > >> directly.
> > >> (its a sysctl call).
> > >>
> > >> What I do not see is the manufacturer/chip pairs that you get in 
> > >> /proc/cpuinfo.
> > >> (of course I could be blind :) )
> > >
> > > I tried looking myself, and the only relevant stuff I found uses 
> > > hw.cpufamily
> > > instead.
> >
> > I pulled an updated version and I see that we now have the opposite issue 
> > the
> > Linux section now lists manufacturer=x61 (apple) and then several chip 
> > values
> > that map onto the m1, 2 and 3.  I guess that’s a new use-case, I don’t see 
> > anything
> > in the def file that caters to many-chip-id => one core mappings
> >
> > >>>>
> > >>>> * Currently, we do not seem to have any way to specify that M2/M3 has
> > support
> > >>>> for FEAT_BTI, but because of missing feaures is not compliant with the 
> > >>>> Arm
> > >>>> base rev that implies this.
> > >>>
> > >>> Since FEAT_BTI only adds hint instructions, I don't think any part of 
> > >>> the
> > >>> compiler actually checks for whether the feature is supported.  Whether 
> > >>> or not
> > >>> to emit FEAT_BTI instructions is controlled by a different compiler 
> > >>> option.
> > >>
> > >> I guess the question then is how do we enable it for apple-m2+ and not 
> > >> for the
> > >> m1 (or are you saying it does not matter, since the lower revs would 
> > >> just treat
> > >> the hint as NOP)?
> > >
> > > I'm saying it doesn't matter.
> >
> > ack.
> >
> > >>>> +/* Apple (A12 and M) cores based on Armv8.
> > >>>> +   Apple implementer ID from xnu,
> > >>>> +   Guesses for part # and suitable scheduler ident, generic_armv8_a 
> > >>>> for costs.
> > >>>> +   A12 seems mostly 8.3,
> > >>>> +   M1 seems to be 8.4 + extras (see comments in option-extensions 
> > >>>> about
> > f16fml),
> > >>>> +   M2 mostly 8.5 but with missing mandatory features.
> > >>>> +   M3 is essentially the same as M2 for the features declared here.  
> > >>>> */
> > >>>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A,  (),
> > generic_armv8_a, 0x61, 0x12, -1)
> > >>>> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (F16, SB, SSBS),
> > generic_armv8_a, 0x61, 0x23, -1)
> > >>>> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A,  (I8MM, BF16,
> > F16, SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> > >>>> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A,  (I8MM, BF16,
> > F16, SB, SSBS), generic_armv8_a, 0x61, 0x23, -1)
> > >>>> +
> > >
> > > Are the Apple cpus actually big/little implementations,
> >
> > Yes
> >
> > > in which case there will be two different part ids?  If so, then perhaps 
> > > this should
> > combine the
> > > two part numbers using the AARCH64_BIG_LITTLE macro.  I'm not totally 
> > > sure,
> > > however, since I don't see other recent implementation handled in this 
> > > manner.
> >
> 
> We've not really had any.  Many of our recent CPUs in this space are big 
> medium Little
> and there are many ways you can combine them and in which they are combined.
> 
> As such we've omitted them, mainly because at the end of the day you can't 
> (easily)
> Run gcc directly on these devices anyway and we would have to pick a core for 
> the
> codegen anyway.
> 
> These cores are somewhat different as you have Asahi, where GCC does run.
> Without specifying AARCH64_BIG_LITTLE macro with both cores (see some 
> examples in
> the cores file) we'd simply not detect the system during -march=native.
> 
> > me either - and for the short-term I am happy to treat them as one core (the
> > feature sets
> > must match so it does not seem to make code-gen differences).
> 
> I'd agree, the only thing missing out here is that for native detection we'd 
> loose
> that they were Armv8.4-a, or Armv8.3-a cores, but that really has little 
> material effect
> anymore in GCC.
> 
> I would say if we can find both core IDs we should use them, otherwise this 
> is already
> an improvement on the situation.

There are some part numbers listed at:
https://github.com/AsahiLinux/docs/wiki/HW:ARM-System-Registers

That only seems to cover apple-a12 and apple-m1.

> 
> Thanks,
> Tamar
> 
> >
> > >>>
> > >>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a few
> > things:
> > >>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
> > >>
> > >> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from 
> > >> the
> > sysctl.
> > >> FEAT_AES: 1
> > >> FEAT_SHA3: 1
> > >> So I’ve added those to the three entries.
> > >
> > > There some architecture feature names that are effectively aliases in the 
> > > spec,
> > > although identifying this requires reading the restrictions of the id 
> > > register
> > > fields (and at least one version of the spec accidentally omitted one of 
> > > the
> > > dependencies).  In summary:
> > > - +sha2 = FEAT_SHA1 and FEAT_SHA256
> > > - +aes = FEAT_AES and FEAT_PMULL
> > > - +sha3 = FEAT_SHA512 and FEAT_SHA3
> >
> > thanks - that was not obvious.
> >
> > However, if I add any of these to the 8.4 spec, LLVM’s back end (at least 
> > the ones
> > via xcode) drops the arch rev down and we fail to build libgcc because of 
> > missing
> > support for fp16.
> >
> > This is likely a bug - but I don’t really know how to describe it at the 
> > moment - and
> > it won’t make any difference to the assemblers already in the wild - so I 
> > will leave
> > these out of the list for now.
> >
> > >>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1);
> > >> FEAT_FRINTTS: 1
> > >> FEAT_FlagM2: 1
> > >> So I;ve added those.
> >
> > The build with these added succeeded with no change in test results.
> >
> > >>
> > >>> - PREDRES (from apple-m1)
> > >>
> > >> I cannot find FEAT_PREDRES …
> > >> … however we do have
> > >> FEAT_SPECRES: 0
> > >
> > > FEAT_SPECRES in the architecture spec is the same as the +predres 
> > > toolchain
> > > flag.  LLVM seems to think the is supported from apple-m1.
> >
> > The Linux (cfarm103) for M1 says:
> >
> > Features      : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp
> > asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm
> > dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
> >
> > I do not see mention of predres or specres there.
> > (which seems to agree with the output of sysctl under XNU).
> >
> > Advice welcome - I guess we could say “well the apple toolchains are 
> > sending v8.5
> > to llvm regardless of this, so it does not matter if GCC does the same”.  
> > OTOH - one
> > reason for posting this patch is for Linux hosted on the same h.w and that 
> > will,
> > presumably, be using GNU binutils …
> >
> > thanks again.
> > Iain
> >
> 

Reply via email to