On Thu, Jan 09, 2025 at 06:00:34PM +0000, Richard Sandiford wrote: > Richard Sandiford <richard.sandif...@arm.com> writes: > > Andrew Carlotti <andrew.carlo...@arm.com> writes: > >> On Mon, Nov 25, 2024 at 11:26:39PM +0000, Richard Sandiford wrote: > >>> Sorry for the slow review. > >>> > >>> Andrew Carlotti <andrew.carlo...@arm.com> writes: > >>> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintts, +wfxt and +xs) > >>> > were only recently added to the assembler. To improve compatibility > >>> > with older assemblers, we try to avoid passing these new flags to the > >>> > assembler if we can express the targetted architecture without them. We > >>> > do so by using an almost-equivalent architecture string with a higher > >>> > architecture version. > >>> > > >>> > This should never reduce the set of instructions accepted by the > >>> > assembler. It will make it more lenient in two cases: > >>> > > >>> > 1. Many system registers are currently gated behind architecture > >>> > versions instead of specific feature flags. Increasing the base > >>> > architecture version may cause more system register accesses to be > >>> > accepted. > >>> > > >>> > 2. FEAT_XS doesn't have an HWCAP bit or cpuinfo entry. We still want to > >>> > avoid passing +wfxt or +noxs to the assembler if possible, so we'll > >>> > instruct the assembler to accept FEAT_XS instructions as well whenever > >>> > the rest of the new features are enabled. > >>> > > >>> > gcc/ChangeLog: > >>> > > >>> > * common/config/aarch64/aarch64-common.cc > >>> > (aarch64_get_arch_string_for_assembler): New. > >>> > (aarch64_rewrite_march): New. > >>> > (aarch64_rewrite_selected_cpu): Call new function. > >>> > * config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity > >>> > mapping. > >>> > * config/aarch64/aarch64-protos.h > >>> > (aarch64_get_arch_string_for_assembler): New. > >>> > * config/aarch64/aarch64.cc > >>> > (aarch64_declare_function_name): Call new function. > >>> > (aarch64_start_file): Ditto. > >>> > * config/aarch64/aarch64.h > >>> > * config/aarch64/aarch64.h > >>> > (EXTRA_SPEC_FUNCTIONS): Use new macro name. > >>> > (MCPU_TO_MARCH_SPEC): Rename to... > >>> > (MARCH_REWRITE_SPEC): ...this, and add new spec rule. > >>> > (aarch64_rewrite_march): New declaration. > >>> > (MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to... > >>> > (MARCH_REWRITE_SPEC_FUNCTIONS): ...this, and add new function. > >>> > (ASM_CPU_SPEC): Use new macro name. > >>> > > >>> > gcc/testsuite/ChangeLog: > >>> > > >>> > * gcc.target/aarch64/cpunative/native_cpu_21.c: Update check. > >>> > * gcc.target/aarch64/cpunative/native_cpu_22.c: Update check. > >>> > * gcc.target/aarch64/cpunative/info_27: New test. > >>> > * gcc.target/aarch64/cpunative/info_28: New test. > >>> > * gcc.target/aarch64/cpunative/info_29: New test. > >>> > * gcc.target/aarch64/cpunative/native_cpu_27.c: New test. > >>> > * gcc.target/aarch64/cpunative/native_cpu_28.c: New test. > >>> > * gcc.target/aarch64/cpunative/native_cpu_29.c: New test. > >>> > > >>> > > >>> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc > >>> > b/gcc/common/config/aarch64/aarch64-common.cc > >>> > index > >>> > 2bfc597e333b6018970a9ee6e370a66b6d0960ef..717b3238be16f39a6fd1b4143662eb540ccf292d > >>> > 100644 > >>> > --- a/gcc/common/config/aarch64/aarch64-common.cc > >>> > +++ b/gcc/common/config/aarch64/aarch64-common.cc > >>> > @@ -371,6 +371,119 @@ aarch64_get_extension_string_for_isa_flags > >>> > return outstr; > >>> > } > >>> > > >>> > +/* Generate an arch string to be passed to the assembler. > >>> > + > >>> > + Several flags were added retrospectively for features that were > >>> > previously > >>> > + enabled only by specifying an architecture version. We want to > >>> > avoid > >>> > + passing these flags to the assembler if possible, to improve > >>> > compatibility > >>> > + with older assemblers. */ > >>> > + > >>> > +std::string > >>> > +aarch64_get_arch_string_for_assembler (aarch64_arch arch, > >>> > + aarch64_feature_flags flags) > >>> > +{ > >>> > + if (!(flags & AARCH64_FL_FCMA) || !(flags & AARCH64_FL_JSCVT)) > >>> > + goto done; > >>> > + > >>> > + if (arch == AARCH64_ARCH_V8A > >>> > + || arch == AARCH64_ARCH_V8_1A > >>> > + || arch == AARCH64_ARCH_V8_2A) > >>> > + arch = AARCH64_ARCH_V8_3A; > >>> > + > >>> > + if (!(flags & AARCH64_FL_RCPC2)) > >>> > + goto done; > >>> > + > >>> > + if (arch == AARCH64_ARCH_V8_3A) > >>> > + arch = AARCH64_ARCH_V8_4A; > >>> > + > >>> > + if (!(flags & AARCH64_FL_FRINTTS) || !(flags & AARCH64_FL_FLAGM2)) > >>> > + goto done; > >>> > + > >>> > + if (arch == AARCH64_ARCH_V8_4A) > >>> > + arch = AARCH64_ARCH_V8_5A; > >>> > + > >>> > + if (!(flags & AARCH64_FL_WFXT)) > >>> > + goto done; > >>> > + > >>> > + if (arch == AARCH64_ARCH_V8_5A || arch == AARCH64_ARCH_V8_6A) > >>> > + { > >>> > + arch = AARCH64_ARCH_V8_7A; > >>> > + /* We don't support native detection for FEAT_XS, so we'll > >>> > assume it's > >>> > + present if the rest of these features are also present. If we > >>> > don't > >>> > + do this, then we would end up passing +noxs to the assembler. > >>> > */ > >>> > + flags |= AARCH64_FL_XS; > >>> > + } > >>> > +done: > >>> > + > >>> > + const struct arch_to_arch_name* a_to_an; > >>> > + for (a_to_an = all_architectures; > >>> > + a_to_an->arch != aarch64_no_arch; > >>> > + a_to_an++) > >>> > + { > >>> > + if (a_to_an->arch == arch) > >>> > + break; > >>> > + } > >>> > + > >>> > + std::string outstr = a_to_an->arch_name > >>> > + + aarch64_get_extension_string_for_isa_flags (flags, > >>> > a_to_an->flags); > >>> > + > >>> > + return outstr; > >>> > +} > >>> > >>> I was hoping we could do this in a table-driven way. Experimenting > >>> a bit locally (but only lightly tested), the following seems to work: > >>> > >>> aarch64.h: > >>> > >>> /* The set of all architecture flags. */ > >>> constexpr auto AARCH64_FL_ARCHES ATTRIBUTE_UNUSED = aarch64_feature_flags > >>> (0) > >>> #define AARCH64_ARCH(A, B, ARCH_IDENT, D, E) \ > >>> | feature_deps::ARCH_IDENT ().flag > >>> #include "config/aarch64/aarch64-arches.def" > >>> ; > >>> > >>> aarch64-common.cc: > >>> > >>> ... > >>> const struct arch_to_arch_name *best = nullptr; > >>> for (auto *a_to_an = all_architectures; > >>> a_to_an->arch != aarch64_no_arch; > >>> a_to_an++) > >>> { > >>> /* Require the architecture to have all architecture flags in > >>> FLAGS. */ > >>> if ((~a_to_an->flags & flags & AARCH64_FL_ARCHES) != 0) > >>> continue; > >>> > >>> /* Skip architectures that add no new mandatory features. */ > >>> if (best && (a_to_an->flags & ~best->flags & ~AARCH64_FL_ARCHES) == > >>> 0) > >>> continue; > >>> > >>> /* Require FLAGS to include all mandatory extensions. */ > >>> if ((a_to_an->flags & ~flags & ~AARCH64_FL_ARCHES) != 0) > >>> continue; > >>> > >>> best = a_to_an; > >>> } > >> > >> There are some hypothetical cases in which your suggested approach > >> wouldn't be > >> able to avoid the new feature flag, whereas my more targetted approach here > >> would. I'm struggling to think of a realistic example though, as I can > >> only > >> think of one core in which any of these features have been backported to an > >> earlier architecture version. The closest bad examples I can find are: > >> > >> - Using -mcpu=a64fx+nosve: This would canonicalise to armv8.2-a+f16+fcma > >> with > >> either approach. If jscvt didn't exist, however, then my approach would > >> instead give armv8.3-a+nopauth+norcpc. > >> - Running on an unrecognised future cpu with an unfortunate combination of > >> features could also be an example, but that's probably unrealistic. > > > > Thanks for considering these cases. But like you say, the first one > > is counterfactual, in that we do have jscvt. And personally, I think > > it's less surprising to add +fcma in that case anyway, rather than have > > +nosve bump the architecture version from armv8.2-a to armv8.3-a. > > It seems odd for +no to increase the architecture level. > > > > So IMO, the fact that we use the "correct" base architecture is a plus > > rather than a minus. This is more about choosing between multiple ways > > of telling a direct truth, rather than trying to be indirect. > > > >> I also feel that this is a work-around that we should be applying in > >> moderation > >> only when we know it would actually help with compatibility issues. > > > > I can see that. But I think it's easier to reason about, and less > > surprising, if the behaviour stems from easily-understood rules, > > rather than being a list of special cases. > > > > Also, I think replacing -march=armv8-a+all+the+features+for+v8.8-a > > with -march=armv8.8-a is an independent good. So... > > > >> However, if you'd still prefer to use a bigger table-driven hammer, then I > >> can > >> change the patch to do that. > > > > ...yeah, I would still prefer that, sorry. :) > > To recap, the original problem that we're trying to solve is: > > In addition to Andrew P's comment about documentation, doesn't this > mean that -mcpu=native will now emit +fcma .arch strings for > unrecognised CPUs (i.e. those for which we can't establish a > baseline beyond Armv8-A?). E.g., I think: > > processor : 0 > BogoMIPS : 100.00 > Features : fp asimd atomics crc32 asimdrdm paca pacg lrcpc fcma > CPU implementer : 0xaa > CPU architecture: 8 > CPU variant : 0xaa > CPU part : 0xaa > CPU revision : 0 > > should enable all the Armv8.3-A features that GCC is aware of after > this patch, but we emit: > > .arch armv8-a+lse+rdma+crc+fcma+rcpc+pauth > > rather than: > > .arch armv8.3-a > > And that could be a problem because binutils support for the +fcma > name is relatively recent (your patch from January this year). > Assembling with older versions of gas is likely to fail, regardless > of whether the code uses FCMA. > > I think we might need to adjust the driver code so that it tries > to consolidate features into an architecture level where possible. > > However, Andrew C pointed out off-list that this wouldn't solve > the problem for Armv8.5-A and above, because predres is a mandatory > feature for Armv8.5-A, but it does not have a /proc/cpuinfo identifier. > We'd therefore never go beyond armv8.5-a and would add an explicit > +flagm2 (etc.) for Armv8.6-A and above, or for Armv9-A and above. > > And Armv8.5-A+ and Armv9-A+ are likely to be the only cases that > matter, in that it's relatively unlikely that new CPUs would be > below those baselines. > > One option would have been to limit the procedure above to flags > that have /proc/cpuinfo identifiers, but that makes the dangerous > assumption that features without /proc/cpuinfo identifiers would > never be used by the compiler (only directly via intrinsics or asm). > > Another alternative would be to go back to the original patch, > that sometimes goes to a higher architecture level and then > adds +no options. But that feels dangerous too. Going back to > the example above, it seems wrong in principle to emit > -march=armv8.3-a+stuff for A64FX. > > So after all that, I suppose we're just going to have to accept that > anyone using -mcpu=native with GCC15+ on a target that GCC doesn't > recognise will need to use binutils 2.42+. It would be worth a mention > in the release notes, under the "Caveats" section. > > Does anyone disagree? > > Thanks, > Richard
I've pushed the first 9 patches in the series to master. I'll also repost a refactored version of this patch 10/10 with the extra canonicalisation removed - that will be a good cleanup and will be useful if we want to try tweaking the assembler target strings in any other way.