On Mon, Jan 20, 2025 at 06:29:12PM +0000, Tamar Christina wrote: > > -----Original Message----- > > From: Iain Sandoe <i...@sandoe.co.uk> > > Sent: Monday, January 20, 2025 6:15 PM > > To: Andrew Carlotti <andrew.carlo...@arm.com> > > Cc: Kyrylo Tkachov <ktkac...@nvidia.com>; GCC Patches <gcc- > > patc...@gcc.gnu.org>; Tamar Christina <tamar.christ...@arm.com>; Richard > > Sandiford <richard.sandif...@arm.com>; Sam James <s...@gentoo.org> > > Subject: Re: [PATCH] aarch64: Provide initial specifications for Apple CPU > > cores. > > > > > > > > > On 20 Jan 2025, at 17:38, Andrew Carlotti <andrew.carlo...@arm.com> wrote: > > > > > > On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote: > > >> > > > > >> Please note that the Darwin assembler is Apple’s LLVM backend (invoked > > >> via > > clang -cc1as) > > >> and that means that whatever GCC deduces for the feature set and outputs > > >> into > > the asm > > >> in the .arch line has to mean the same to LLVM as it does to GCC. > > >> > > >> For example: > > >> I ran into a problem (that might or might not still exist) where > > >> specifying crc on > > top of a > > >> 8.4 spec dropped the base rev assumed back to 8.x where crc was > > >> introduced > > (that could > > >> be just a bug, but I have to live with it). > > > > > > I remembered reading about this issue before; after some searching I > > > discoved > > > that it was via another email from you in 2023: > > > https://gcc.gnu.org/pipermail/gcc/2023-October/242748.html > > > > > > So I did some more digging. I believe the CRC workaround was added to > > > handle > > > misspecified CPUs in Binutils. In particular: > > > > > > - +crc only existed from Feb 2013 (e60bb1dd3) > > > > This is a “solved problem” on the Darwni development branch ( I made a > > configure > > check to back out of the fix if it isn’t needed) … but see below… > > > > >>> How about the llvm/lib/TargetParser/Host.cpp in upstream LLVM for the > > >>> part > > numbers? > > >>> I see it has different values for the M1,M2,M3 ones that you have in > > >>> your > > patch. > > >> > > >> Looking at a recent version of that I see the host-side values that we > > >> use when > > >> doing the native query. These are obtained from the OS - not the chip > > >> directly. > > >> (its a sysctl call). > > >> > > >> What I do not see is the manufacturer/chip pairs that you get in > > >> /proc/cpuinfo. > > >> (of course I could be blind :) ) > > > > > > I tried looking myself, and the only relevant stuff I found uses > > > hw.cpufamily > > > instead. > > > > I pulled an updated version and I see that we now have the opposite issue > > the > > Linux section now lists manufacturer=x61 (apple) and then several chip > > values > > that map onto the m1, 2 and 3. I guess that’s a new use-case, I don’t see > > anything > > in the def file that caters to many-chip-id => one core mappings > > > > >>>> > > >>>> * Currently, we do not seem to have any way to specify that M2/M3 has > > support > > >>>> for FEAT_BTI, but because of missing feaures is not compliant with the > > >>>> Arm > > >>>> base rev that implies this. > > >>> > > >>> Since FEAT_BTI only adds hint instructions, I don't think any part of > > >>> the > > >>> compiler actually checks for whether the feature is supported. Whether > > >>> or not > > >>> to emit FEAT_BTI instructions is controlled by a different compiler > > >>> option. > > >> > > >> I guess the question then is how do we enable it for apple-m2+ and not > > >> for the > > >> m1 (or are you saying it does not matter, since the lower revs would > > >> just treat > > >> the hint as NOP)? > > > > > > I'm saying it doesn't matter. > > > > ack. > > > > >>>> +/* Apple (A12 and M) cores based on Armv8. > > >>>> + Apple implementer ID from xnu, > > >>>> + Guesses for part # and suitable scheduler ident, generic_armv8_a > > >>>> for costs. > > >>>> + A12 seems mostly 8.3, > > >>>> + M1 seems to be 8.4 + extras (see comments in option-extensions > > >>>> about > > f16fml), > > >>>> + M2 mostly 8.5 but with missing mandatory features. > > >>>> + M3 is essentially the same as M2 for the features declared here. > > >>>> */ > > >>>> +AARCH64_CORE("apple-a12", applea12, cortexa53, V8_3A, (), > > generic_armv8_a, 0x61, 0x12, -1) > > >>>> +AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A, (F16, SB, SSBS), > > generic_armv8_a, 0x61, 0x23, -1) > > >>>> +AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A, (I8MM, BF16, > > F16, SB, SSBS), generic_armv8_a, 0x61, 0x23, -1) > > >>>> +AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A, (I8MM, BF16, > > F16, SB, SSBS), generic_armv8_a, 0x61, 0x23, -1) > > >>>> + > > > > > > Are the Apple cpus actually big/little implementations, > > > > Yes > > > > > in which case there will be two different part ids? If so, then perhaps > > > this should > > combine the > > > two part numbers using the AARCH64_BIG_LITTLE macro. I'm not totally > > > sure, > > > however, since I don't see other recent implementation handled in this > > > manner. > > > > We've not really had any. Many of our recent CPUs in this space are big > medium Little > and there are many ways you can combine them and in which they are combined. > > As such we've omitted them, mainly because at the end of the day you can't > (easily) > Run gcc directly on these devices anyway and we would have to pick a core for > the > codegen anyway. > > These cores are somewhat different as you have Asahi, where GCC does run. > Without specifying AARCH64_BIG_LITTLE macro with both cores (see some > examples in > the cores file) we'd simply not detect the system during -march=native. > > > me either - and for the short-term I am happy to treat them as one core (the > > feature sets > > must match so it does not seem to make code-gen differences). > > I'd agree, the only thing missing out here is that for native detection we'd > loose > that they were Armv8.4-a, or Armv8.3-a cores, but that really has little > material effect > anymore in GCC. > > I would say if we can find both core IDs we should use them, otherwise this > is already > an improvement on the situation.
There are some part numbers listed at: https://github.com/AsahiLinux/docs/wiki/HW:ARM-System-Registers That only seems to cover apple-a12 and apple-m1. > > Thanks, > Tamar > > > > > >>> > > >>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a few > > things: > > >>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1); > > >> > > >> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from > > >> the > > sysctl. > > >> FEAT_AES: 1 > > >> FEAT_SHA3: 1 > > >> So I’ve added those to the three entries. > > > > > > There some architecture feature names that are effectively aliases in the > > > spec, > > > although identifying this requires reading the restrictions of the id > > > register > > > fields (and at least one version of the spec accidentally omitted one of > > > the > > > dependencies). In summary: > > > - +sha2 = FEAT_SHA1 and FEAT_SHA256 > > > - +aes = FEAT_AES and FEAT_PMULL > > > - +sha3 = FEAT_SHA512 and FEAT_SHA3 > > > > thanks - that was not obvious. > > > > However, if I add any of these to the 8.4 spec, LLVM’s back end (at least > > the ones > > via xcode) drops the arch rev down and we fail to build libgcc because of > > missing > > support for fp16. > > > > This is likely a bug - but I don’t really know how to describe it at the > > moment - and > > it won’t make any difference to the assemblers already in the wild - so I > > will leave > > these out of the list for now. > > > > >>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1); > > >> FEAT_FRINTTS: 1 > > >> FEAT_FlagM2: 1 > > >> So I;ve added those. > > > > The build with these added succeeded with no change in test results. > > > > >> > > >>> - PREDRES (from apple-m1) > > >> > > >> I cannot find FEAT_PREDRES … > > >> … however we do have > > >> FEAT_SPECRES: 0 > > > > > > FEAT_SPECRES in the architecture spec is the same as the +predres > > > toolchain > > > flag. LLVM seems to think the is supported from apple-m1. > > > > The Linux (cfarm103) for M1 says: > > > > Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp > > asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm > > dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint > > > > I do not see mention of predres or specres there. > > (which seems to agree with the output of sysctl under XNU). > > > > Advice welcome - I guess we could say “well the apple toolchains are > > sending v8.5 > > to llvm regardless of this, so it does not matter if GCC does the same”. > > OTOH - one > > reason for posting this patch is for Linux hosted on the same h.w and that > > will, > > presumably, be using GNU binutils … > > > > thanks again. > > Iain > > >