On Tue, Jan 21, 2025 at 08:21:45PM +0000, Iain Sandoe wrote: > > > > On 20 Jan 2025, at 18:33, Andrew Carlotti <andrew.carlo...@arm.com> wrote: > > > > On Mon, Jan 20, 2025 at 06:29:12PM +0000, Tamar Christina wrote: > >>> -----Original Message----- > >>> From: Iain Sandoe <i...@sandoe.co.uk> > >>> Sent: Monday, January 20, 2025 6:15 PM > >>> To: Andrew Carlotti <andrew.carlo...@arm.com> > >>> Cc: Kyrylo Tkachov <ktkac...@nvidia.com>; GCC Patches <gcc- > >>> patc...@gcc.gnu.org>; Tamar Christina <tamar.christ...@arm.com>; Richard > >>> Sandiford <richard.sandif...@arm.com>; Sam James <s...@gentoo.org> > >>> Subject: Re: [PATCH] aarch64: Provide initial specifications for Apple > >>> CPU cores. > >>> > >>> > >>> > >>>> On 20 Jan 2025, at 17:38, Andrew Carlotti <andrew.carlo...@arm.com> > >>>> wrote: > >>>> > >>>> On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote: > > >> > >> I would say if we can find both core IDs we should use them, otherwise > >> this is already > >> an improvement on the situation. > > > > There are some part numbers listed at: > > https://github.com/AsahiLinux/docs/wiki/HW:ARM-System-Registers > > > > That only seems to cover apple-a12 and apple-m1. > > The latest llvm Host.cpp contains the opposite problem: > apple-m1 is mapped to : 0x20, 21, 22, 23, 24, 25, 28, 29 > apple-m2 is mapped to : 0x30, 31, 32, 33, 34, 35, 38, 39 > apple-m3 is mapped to : 0x48, 49 > > (if I had to speculate I might say that odd or even numbers related to > big/litte - but the > number of options ….)
If all of those numbers are correct, and assuming they always come in consecutive big/little pairs, then I think you can duplicate the names and do something like: AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A, (<FEATURES>), generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x20, 0x21), -1) AARCH64_CORE("apple-m1", applem1a, cortexa57, V8_4A, (<FEATURES>), generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x22, 0x23), -1) AARCH64_CORE("apple-m1", applem1b, cortexa57, V8_4A, (<FEATURES>), generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x24, 0x25), -1) AARCH64_CORE("apple-m1", applem1c, cortexa57, V8_4A, (<FEATURES>), generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x28, 0x29), -1) That is, you can use the same string for each one, but different IDENTs. The cost model should eventually be something better than this, and this isn't a comment on whether the listed part IDs are correct. However, this should be a valid way of representing the given cpu data within GCC. > > >>>>>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a > >>>>>> few > >>> things: > >>>>>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1); > >>>>> > >>>>> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from > >>>>> the > >>> sysctl. > >>>>> FEAT_AES: 1 > >>>>> FEAT_SHA3: 1 > >>>>> So I’ve added those to the three entries. > >>>> > >>>> There some architecture feature names that are effectively aliases in > >>>> the spec, > >>>> although identifying this requires reading the restrictions of the id > >>>> register > >>>> fields (and at least one version of the spec accidentally omitted one of > >>>> the > >>>> dependencies). In summary: > >>>> - +sha2 = FEAT_SHA1 and FEAT_SHA256 > >>>> - +aes = FEAT_AES and FEAT_PMULL > >>>> - +sha3 = FEAT_SHA512 and FEAT_SHA3 > >>> > >>> thanks - that was not obvious. > >>> > >>> However, if I add any of these to the 8.4 spec, LLVM’s back end (at least > >>> the ones > >>> via xcode) drops the arch rev down and we fail to build libgcc because of > >>> missing > >>> support for fp16. > >>> > >>> This is likely a bug - but I don’t really know how to describe it at the > >>> moment - and > >>> it won’t make any difference to the assemblers already in the wild - so I > >>> will leave > >>> these out of the list for now. > >>> > >>>>>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1); > >>>>> FEAT_FRINTTS: 1 > >>>>> FEAT_FlagM2: 1 > >>>>> So I;ve added those. > >>> > >>> The build with these added succeeded with no change in test results. > > So I have found a way that LLVM’s backend is happy with (I need to test > across more xcode versions .. but it’s a start): > > AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A, (AES, SHA2, SHA3, > F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x023, -1) > AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A, (I8MM, BF16, AES, SHA2, > SHA3, F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x033, -1) > AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A, (I8MM, BF16, AES, SHA2, > SHA3, F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x048, -1) > > So, although FP16FML is implicit in 8.4 but F16 is not - it seems that I > cannot specify the missing F16 without causing the other part to get switched > off). Specifying F16FML is OK because that switches on F16 and is part of > 8.4 anyway…. I don't know that I've seen all of the relevant bugs, but after some experimentation I think I've identified at least most of the issues in LLVM. For testing purposes, it's convenient to try things out in Compiler Explorer. An example test can be seen at https://godbolt.org/z/Ez6a7YE7v. Note that there isn't a standalone LLVM assembler at godbolt.org configured for AArch64, but their assembler can be made to target any architecture at runtime. This does cause the subsequent disassembly view to break, because that is still targetting x86, but for our current purposes we only care about the error output from the assembler. My conclusion from this is that bugs appear to exist in LLVM 16 and were fixed in LLVM 17. There are two relevant patches that I could find, both listed at the end of the email. At the time of these patches, LLVM would ignore any unrecognised feature names in .arch directives (I haven't checked whether this is still the case). It also would only update the set of available features after successfully parsing a feature name in a .arch directive, so any bare architecture version names (with no extra features) would be ignored. Thus it turns out that "+crc" emission workaround in GCC to handle Binutils cpu specification bugs also turns out to be an effective workaround for LLVM ignoring bare architecture version directives. On the other hand, as soon LLVM (<= 16.0) has parsed any .arch directive that includes recognised feature names, then it becomes impossible to access any features weren't recognised in that part of the assembler. Before this point, those features might still have been available due to them being specified on the command line and not yet overridden. So to give a couple of examples: -march = armv8.4-a+fp16 .arch armv8.4-a+fp16 leaves fp16/fp16fml instructions available, because the .arch line had no recognised feature names and was thus ignored. On the other hand: -march = armv8.4-a+fp16 .arch armv8.4-a+crc+fp16 leaves fp16/fp16fml instructions disabled. The existing +crc workaround for Binutils is already an effective workaround for the other bug (failing to update the available features), so I think it makes sense to leave that in place unchanged. The only problematic directive we emit (ignoring the other bug) is: .arch armv8-a in which case we end up leaving more instructions enabled than intended, unless we had previously specified +nosimd or +nofp. I don't think there's any sensible way to work around the unrecognised feature names in .arch directives (without using the excessively large 'hammer' of enabling all required features on the command line and then deleting all .arch directives). > > >>>>>> - PREDRES (from apple-m1) > >>>>> > >>>>> I cannot find FEAT_PREDRES … > >>>>> … however we do have > >>>>> FEAT_SPECRES: 0 > >>>> > >>>> FEAT_SPECRES in the architecture spec is the same as the +predres > >>>> toolchain > >>>> flag. LLVM seems to think the is supported from apple-m1. > > So what do we do about this? > Do we assume that this is a bug in the sysctl reporting? > Is there anyone on the LLVM toolchain team or within Apple you folks could > query? > I can try via the Apple Open Source folks - but not sure how long that will > take. > > if we could go with 8.5 and 8.6 that would simplify things; > Also I want to add apple-m4 soon and that also reports FEAT_SPECRES = 0. > > thanks, > Iain > > ------- The two LLVM patches, showing only the relevant functional changes (i.e. excluding changes to tests, comments and release notes): commit a499d675ae163429adff0bc3dc4f8215c12441c7 Author: Martin Storsjö <mar...@martin.st> Date: Fri Jun 2 12:25:52 2023 +0300 [AArch64] Make .arch without extra features actually take effect This fixes PR32873 / https://github.com/llvm/llvm-project/issues/32220. Differential Revision: https://reviews.llvm.org/D151982 diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp index 3703cd03e10e..1b54e2cb2b29 100644 --- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp +++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp @@ -6920,6 +6920,7 @@ bool AArch64AsmParser::parseDirectiveArch(SMLoc L) { ExpandCryptoAEK(*ArchInfo, RequestedExtensions); FeatureBitset Features = STI.getFeatureBits(); + setAvailableFeatures(ComputeAvailableFeatures(Features)); for (auto Name : RequestedExtensions) { bool EnableFeature = true; commit 4b8d9abca7d0280878fb12de331e688ee85d7cd8 Author: Martin Storsjö <mar...@martin.st> Date: Fri Jun 2 11:36:40 2023 +0300 [AArch64] Complete the list of extensions supported by .arch and .arch_extension This brings the list of extensions supported here up to date with what is supported by current git versions of binutils. Also add a comment to AArch64TargetParser to remind people to consider adding new ones to the list supported in assembly. In the case of the "rdma" extension, there's a slight surprise: LLVM knows of the extension under the name "rdm", while binutils has it named "rdma". However, binutils appears to accept any abbreviated prefix of an arch extension, so it does accept the form "rdm" too even if it formally considers it called "rdma". Support both spellings for the extensions here, for simplicity. Differential Revision: https://reviews.llvm.org/D151981 diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp index beb360342f91..3703cd03e10e 100644 --- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp +++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp @@ -3677,10 +3677,24 @@ static const struct Extension { {"cssc", {AArch64::FeatureCSSC}}, {"rcpc3", {AArch64::FeatureRCPC3}}, {"gcs", {AArch64::FeatureGCS}}, - // FIXME: Unsupported extensions - {"lor", {}}, - {"rdma", {}}, - {"profile", {}}, + {"bf16", {AArch64::FeatureBF16}}, + {"compnum", {AArch64::FeatureComplxNum}}, + {"dotprod", {AArch64::FeatureDotProd}}, + {"f32mm", {AArch64::FeatureMatMulFP32}}, + {"f64mm", {AArch64::FeatureMatMulFP64}}, + {"fp16", {AArch64::FeatureFullFP16}}, + {"fp16fml", {AArch64::FeatureFP16FML}}, + {"i8mm", {AArch64::FeatureMatMulInt8}}, + {"lor", {AArch64::FeatureLOR}}, + {"profile", {AArch64::FeatureSPE}}, + // "rdma" is the name documented by binutils for the feature, but + // binutils also accepts incomplete prefixes of features, so "rdm" + // works too. Support both spellings here. + {"rdm", {AArch64::FeatureRDM}}, + {"rdma", {AArch64::FeatureRDM}}, + {"sb", {AArch64::FeatureSB}}, + {"ssbs", {AArch64::FeatureSSBS}}, + {"tme", {AArch64::FeatureTME}}, }; static void setRequiredFeatureString(FeatureBitset FBS, std::string &Str) {