On Tue, Jan 21, 2025 at 08:21:45PM +0000, Iain Sandoe wrote:
> 
> 
> > On 20 Jan 2025, at 18:33, Andrew Carlotti <andrew.carlo...@arm.com> wrote:
> > 
> > On Mon, Jan 20, 2025 at 06:29:12PM +0000, Tamar Christina wrote:
> >>> -----Original Message-----
> >>> From: Iain Sandoe <i...@sandoe.co.uk>
> >>> Sent: Monday, January 20, 2025 6:15 PM
> >>> To: Andrew Carlotti <andrew.carlo...@arm.com>
> >>> Cc: Kyrylo Tkachov <ktkac...@nvidia.com>; GCC Patches <gcc-
> >>> patc...@gcc.gnu.org>; Tamar Christina <tamar.christ...@arm.com>; Richard
> >>> Sandiford <richard.sandif...@arm.com>; Sam James <s...@gentoo.org>
> >>> Subject: Re: [PATCH] aarch64: Provide initial specifications for Apple 
> >>> CPU cores.
> >>> 
> >>> 
> >>> 
> >>>> On 20 Jan 2025, at 17:38, Andrew Carlotti <andrew.carlo...@arm.com> 
> >>>> wrote:
> >>>> 
> >>>> On Sun, Jan 19, 2025 at 09:14:17PM +0000, Iain Sandoe wrote:
> 
> >> 
> >> I would say if we can find both core IDs we should use them, otherwise 
> >> this is already
> >> an improvement on the situation.
> > 
> > There are some part numbers listed at:
> > https://github.com/AsahiLinux/docs/wiki/HW:ARM-System-Registers
> > 
> > That only seems to cover apple-a12 and apple-m1.
> 
> The latest llvm Host.cpp contains the opposite problem:
> apple-m1 is mapped to : 0x20, 21, 22, 23, 24, 25, 28, 29
> apple-m2 is mapped to : 0x30, 31, 32, 33, 34, 35, 38, 39
> apple-m3 is mapped to : 0x48, 49
> 
> (if I had to speculate I might say that odd or even numbers related to 
> big/litte - but the
>  number of options ….)

If all of those numbers are correct, and assuming they always come in
consecutive big/little pairs, then I think you can duplicate the names and do
something like:

AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (<FEATURES>), 
generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x20, 0x21), -1)
AARCH64_CORE("apple-m1", applem1a, cortexa57, V8_4A,  (<FEATURES>), 
generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x22, 0x23), -1)
AARCH64_CORE("apple-m1", applem1b, cortexa57, V8_4A,  (<FEATURES>), 
generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x24, 0x25), -1)
AARCH64_CORE("apple-m1", applem1c, cortexa57, V8_4A,  (<FEATURES>), 
generic_armv8_a, 0x61, AARCH64_BIG_LITTLE (0x28, 0x29), -1)

That is, you can use the same string for each one, but different IDENTs.  The
cost model should eventually be something better than this, and this isn't a
comment on whether the listed part IDs are correct.  However, this should be a
valid way of representing the given cpu data within GCC.

> 
> >>>>>> Comparing to LLVM's AArch64Processors.td, this seems to be missing a 
> >>>>>> few
> >>> things:
> >>>>>> - Crpyto extensions (SHA2 and AES, and SHA3 from apple-m1);
> >>>>> 
> >>>>> I do not see FEAT_SHA2 listed in either the Arm doc, or the output from 
> >>>>> the
> >>> sysctl.
> >>>>> FEAT_AES: 1
> >>>>> FEAT_SHA3: 1
> >>>>> So I’ve added those to the three entries.
> >>>> 
> >>>> There some architecture feature names that are effectively aliases in 
> >>>> the spec,
> >>>> although identifying this requires reading the restrictions of the id 
> >>>> register
> >>>> fields (and at least one version of the spec accidentally omitted one of 
> >>>> the
> >>>> dependencies).  In summary:
> >>>> - +sha2 = FEAT_SHA1 and FEAT_SHA256
> >>>> - +aes = FEAT_AES and FEAT_PMULL
> >>>> - +sha3 = FEAT_SHA512 and FEAT_SHA3
> >>> 
> >>> thanks - that was not obvious.
> >>> 
> >>> However, if I add any of these to the 8.4 spec, LLVM’s back end (at least 
> >>> the ones
> >>> via xcode) drops the arch rev down and we fail to build libgcc because of 
> >>> missing
> >>> support for fp16.
> >>> 
> >>> This is likely a bug - but I don’t really know how to describe it at the 
> >>> moment - and
> >>> it won’t make any difference to the assemblers already in the wild - so I 
> >>> will leave
> >>> these out of the list for now.
> >>> 
> >>>>>> - New flags I just added (FRINTTS and FLAGM2 from apple-m1);
> >>>>> FEAT_FRINTTS: 1
> >>>>> FEAT_FlagM2: 1
> >>>>> So I;ve added those.
> >>> 
> >>> The build with these added succeeded with no change in test results.
> 
> So I have found a way that LLVM’s backend is happy with (I need to test 
> across more xcode versions .. but it’s a start):
> 
>  AARCH64_CORE("apple-m1", applem1, cortexa57, V8_4A,  (AES, SHA2, SHA3, 
> F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x023, -1)
> AARCH64_CORE("apple-m2", applem2, cortexa57, V8_4A,  (I8MM, BF16, AES, SHA2, 
> SHA3, F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x033, -1)
> AARCH64_CORE("apple-m3", applem3, cortexa57, V8_4A,  (I8MM, BF16, AES, SHA2, 
> SHA3, F16FML, SB, SSBS, FRINTTS, FLAGM2), generic_armv8_a, 0x61, 0x048, -1)
> 
> So, although FP16FML is implicit in 8.4 but F16 is not - it seems that I 
> cannot specify the missing F16 without causing the other part to get switched 
> off).  Specifying F16FML is OK because that switches on F16 and is part of 
> 8.4 anyway….

I don't know that I've seen all of the relevant bugs, but after some
experimentation I think I've identified at least most of the issues in LLVM.

For testing purposes, it's convenient to try things out in Compiler Explorer.
An example test can be seen at https://godbolt.org/z/Ez6a7YE7v.  Note that
there isn't a standalone LLVM assembler at godbolt.org configured for AArch64,
but their assembler can be made to target any architecture at runtime.  This
does cause the subsequent disassembly view to break, because that is still
targetting x86, but for our current purposes we only care about the error
output from the assembler.

My conclusion from this is that bugs appear to exist in LLVM 16 and were fixed
in LLVM 17.  There are two relevant patches that I could find, both listed at
the end of the email.

At the time of these patches, LLVM would ignore any unrecognised feature names
in .arch directives (I haven't checked whether this is still the case).  It
also would only update the set of available features after successfully parsing
a feature name in a .arch directive, so any bare architecture version names
(with no extra features) would be ignored.

Thus it turns out that "+crc" emission workaround in GCC to handle Binutils cpu
specification bugs also turns out to be an effective workaround for LLVM
ignoring bare architecture version directives.  On the other hand, as soon LLVM
(<= 16.0) has parsed any .arch directive that includes recognised feature
names, then it becomes impossible to access any features weren't recognised in
that part of the assembler.  Before this point, those features might still have
been available due to them being specified on the command line and not yet
overridden.


So to give a couple of examples:

-march = armv8.4-a+fp16
.arch armv8.4-a+fp16

leaves fp16/fp16fml instructions available, because the .arch line had no
recognised feature names and was thus ignored.


On the other hand:

-march = armv8.4-a+fp16
.arch armv8.4-a+crc+fp16

leaves fp16/fp16fml instructions disabled.


The existing +crc workaround for Binutils is already an effective workaround
for the other bug (failing to update the available features), so I think it
makes sense to leave that in place unchanged.  The only problematic directive
we emit (ignoring the other bug) is:

.arch armv8-a

in which case we end up leaving more instructions enabled than intended, unless
we had previously specified +nosimd or +nofp.


I don't think there's any sensible way to work around the unrecognised feature
names in .arch directives (without using the excessively large 'hammer' of
enabling all required features on the command line and then deleting all .arch
directives).


> 
> >>>>>> - PREDRES (from apple-m1)
> >>>>> 
> >>>>> I cannot find FEAT_PREDRES …
> >>>>> … however we do have
> >>>>> FEAT_SPECRES: 0
> >>>> 
> >>>> FEAT_SPECRES in the architecture spec is the same as the +predres 
> >>>> toolchain
> >>>> flag.  LLVM seems to think the is supported from apple-m1.
> 
> So what do we do about this?
> Do we assume that this is a bug in the sysctl reporting?
> Is there anyone on the LLVM toolchain team or within Apple you folks could 
> query?
> I can try via the Apple Open Source folks - but not sure how long that will 
> take.
> 
> if we could go with 8.5 and 8.6 that would simplify things;
> Also I want to add apple-m4 soon and that also reports FEAT_SPECRES  = 0.
> 
> thanks,
> Iain
> 
> 

-------
The two LLVM patches, showing only the relevant functional changes (i.e.
excluding changes to tests, comments and release notes):


commit a499d675ae163429adff0bc3dc4f8215c12441c7
Author: Martin Storsjö <mar...@martin.st>
Date:   Fri Jun 2 12:25:52 2023 +0300

    [AArch64] Make .arch without extra features actually take effect
    
    This fixes PR32873 / https://github.com/llvm/llvm-project/issues/32220.
    
    Differential Revision: https://reviews.llvm.org/D151982

diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp 
b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
index 3703cd03e10e..1b54e2cb2b29 100644
--- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
@@ -6920,6 +6920,7 @@ bool AArch64AsmParser::parseDirectiveArch(SMLoc L) {
   ExpandCryptoAEK(*ArchInfo, RequestedExtensions);
 
   FeatureBitset Features = STI.getFeatureBits();
+  setAvailableFeatures(ComputeAvailableFeatures(Features));
   for (auto Name : RequestedExtensions) {
     bool EnableFeature = true;
 

commit 4b8d9abca7d0280878fb12de331e688ee85d7cd8
Author: Martin Storsjö <mar...@martin.st>
Date:   Fri Jun 2 11:36:40 2023 +0300

    [AArch64] Complete the list of extensions supported by .arch and 
.arch_extension
    
    This brings the list of extensions supported here up to date
    with what is supported by current git versions of binutils.
    
    Also add a comment to AArch64TargetParser to remind people to
    consider adding new ones to the list supported in assembly.
    
    In the case of the "rdma" extension, there's a slight surprise:
    LLVM knows of the extension under the name "rdm", while binutils
    has it named "rdma". However, binutils appears to accept any
    abbreviated prefix of an arch extension, so it does accept the
    form "rdm" too even if it formally considers it called "rdma".
    
    Support both spellings for the extensions here, for simplicity.
    
    Differential Revision: https://reviews.llvm.org/D151981

diff --git a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp 
b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
index beb360342f91..3703cd03e10e 100644
--- a/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
+++ b/llvm/lib/Target/AArch64/AsmParser/AArch64AsmParser.cpp
@@ -3677,10 +3677,24 @@ static const struct Extension {
     {"cssc", {AArch64::FeatureCSSC}},
     {"rcpc3", {AArch64::FeatureRCPC3}},
     {"gcs", {AArch64::FeatureGCS}},
-    // FIXME: Unsupported extensions
-    {"lor", {}},
-    {"rdma", {}},
-    {"profile", {}},
+    {"bf16", {AArch64::FeatureBF16}},
+    {"compnum", {AArch64::FeatureComplxNum}},
+    {"dotprod", {AArch64::FeatureDotProd}},
+    {"f32mm", {AArch64::FeatureMatMulFP32}},
+    {"f64mm", {AArch64::FeatureMatMulFP64}},
+    {"fp16", {AArch64::FeatureFullFP16}},
+    {"fp16fml", {AArch64::FeatureFP16FML}},
+    {"i8mm", {AArch64::FeatureMatMulInt8}},
+    {"lor", {AArch64::FeatureLOR}},
+    {"profile", {AArch64::FeatureSPE}},
+    // "rdma" is the name documented by binutils for the feature, but
+    // binutils also accepts incomplete prefixes of features, so "rdm"
+    // works too. Support both spellings here.
+    {"rdm", {AArch64::FeatureRDM}},
+    {"rdma", {AArch64::FeatureRDM}},
+    {"sb", {AArch64::FeatureSB}},
+    {"ssbs", {AArch64::FeatureSSBS}},
+    {"tme", {AArch64::FeatureTME}},
 };
 
 static void setRequiredFeatureString(FeatureBitset FBS, std::string &Str) {


Reply via email to