On Thu, Jan 09, 2025 at 06:00:34PM +0000, Richard Sandiford wrote:
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > Andrew Carlotti <andrew.carlo...@arm.com> writes:
> >> On Mon, Nov 25, 2024 at 11:26:39PM +0000, Richard Sandiford wrote:
> >>> Sorry for the slow review.
> >>> 
> >>> Andrew Carlotti <andrew.carlo...@arm.com> writes:
> >>> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintts, +wfxt and +xs)
> >>> > were only recently added to the assembler.  To improve compatibility
> >>> > with older assemblers, we try to avoid passing these new flags to the
> >>> > assembler if we can express the targetted architecture without them. We
> >>> > do so by using an almost-equivalent architecture string with a higher
> >>> > architecture version.
> >>> >
> >>> > This should never reduce the set of instructions accepted by the
> >>> > assembler.  It will make it more lenient in two cases:
> >>> >
> >>> > 1. Many system registers are currently gated behind architecture
> >>> > versions instead of specific feature flags.  Increasing the base
> >>> > architecture version may cause more system register accesses to be
> >>> > accepted.
> >>> >
> >>> > 2. FEAT_XS doesn't have an HWCAP bit or cpuinfo entry.  We still want to
> >>> > avoid passing +wfxt or +noxs to the assembler if possible, so we'll
> >>> > instruct the assembler to accept FEAT_XS instructions as well whenever
> >>> > the rest of the new features are enabled.
> >>> >
> >>> > gcc/ChangeLog:
> >>> >
> >>> >         * common/config/aarch64/aarch64-common.cc
> >>> >         (aarch64_get_arch_string_for_assembler): New.
> >>> >         (aarch64_rewrite_march): New.
> >>> >         (aarch64_rewrite_selected_cpu): Call new function.
> >>> >         * config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity 
> >>> > mapping.
> >>> >         * config/aarch64/aarch64-protos.h
> >>> >         (aarch64_get_arch_string_for_assembler): New.
> >>> >         * config/aarch64/aarch64.cc
> >>> >         (aarch64_declare_function_name): Call new function.
> >>> >         (aarch64_start_file): Ditto.
> >>> >         * config/aarch64/aarch64.h
> >>> >         * config/aarch64/aarch64.h
> >>> >         (EXTRA_SPEC_FUNCTIONS): Use new macro name.
> >>> >         (MCPU_TO_MARCH_SPEC): Rename to...
> >>> >         (MARCH_REWRITE_SPEC): ...this, and add new spec rule.
> >>> >         (aarch64_rewrite_march): New declaration.
> >>> >         (MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
> >>> >         (MARCH_REWRITE_SPEC_FUNCTIONS): ...this, and add new function.
> >>> >         (ASM_CPU_SPEC): Use new macro name.
> >>> >
> >>> > gcc/testsuite/ChangeLog:
> >>> >
> >>> >         * gcc.target/aarch64/cpunative/native_cpu_21.c: Update check.
> >>> >         * gcc.target/aarch64/cpunative/native_cpu_22.c: Update check.
> >>> >         * gcc.target/aarch64/cpunative/info_27: New test.
> >>> >         * gcc.target/aarch64/cpunative/info_28: New test.
> >>> >         * gcc.target/aarch64/cpunative/info_29: New test.
> >>> >         * gcc.target/aarch64/cpunative/native_cpu_27.c: New test.
> >>> >         * gcc.target/aarch64/cpunative/native_cpu_28.c: New test.
> >>> >         * gcc.target/aarch64/cpunative/native_cpu_29.c: New test.
> >>> >
> >>> >
> >>> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> >>> > b/gcc/common/config/aarch64/aarch64-common.cc
> >>> > index 
> >>> > 2bfc597e333b6018970a9ee6e370a66b6d0960ef..717b3238be16f39a6fd1b4143662eb540ccf292d
> >>> >  100644
> >>> > --- a/gcc/common/config/aarch64/aarch64-common.cc
> >>> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
> >>> > @@ -371,6 +371,119 @@ aarch64_get_extension_string_for_isa_flags
> >>> >    return outstr;
> >>> >  }
> >>> >  
> >>> > +/* Generate an arch string to be passed to the assembler.
> >>> > +
> >>> > +   Several flags were added retrospectively for features that were 
> >>> > previously
> >>> > +   enabled only by specifying an architecture version.  We want to 
> >>> > avoid
> >>> > +   passing these flags to the assembler if possible, to improve 
> >>> > compatibility
> >>> > +   with older assemblers.  */
> >>> > +
> >>> > +std::string
> >>> > +aarch64_get_arch_string_for_assembler (aarch64_arch arch,
> >>> > +                                      aarch64_feature_flags flags)
> >>> > +{
> >>> > +  if (!(flags & AARCH64_FL_FCMA) || !(flags & AARCH64_FL_JSCVT))
> >>> > +    goto done;
> >>> > +
> >>> > +  if (arch == AARCH64_ARCH_V8A
> >>> > +      || arch == AARCH64_ARCH_V8_1A
> >>> > +      || arch == AARCH64_ARCH_V8_2A)
> >>> > +    arch = AARCH64_ARCH_V8_3A;
> >>> > +
> >>> > +  if (!(flags & AARCH64_FL_RCPC2))
> >>> > +    goto done;
> >>> > +
> >>> > +  if (arch == AARCH64_ARCH_V8_3A)
> >>> > +    arch = AARCH64_ARCH_V8_4A;
> >>> > +
> >>> > +  if (!(flags & AARCH64_FL_FRINTTS) || !(flags & AARCH64_FL_FLAGM2))
> >>> > +    goto done;
> >>> > +
> >>> > +  if (arch == AARCH64_ARCH_V8_4A)
> >>> > +    arch = AARCH64_ARCH_V8_5A;
> >>> > +
> >>> > +  if (!(flags & AARCH64_FL_WFXT))
> >>> > +    goto done;
> >>> > +
> >>> > +  if (arch == AARCH64_ARCH_V8_5A || arch == AARCH64_ARCH_V8_6A)
> >>> > +    {
> >>> > +      arch = AARCH64_ARCH_V8_7A;
> >>> > +      /* We don't support native detection for FEAT_XS, so we'll 
> >>> > assume it's
> >>> > +        present if the rest of these features are also present.  If we 
> >>> > don't
> >>> > +        do this, then we would end up passing +noxs to the assembler.  
> >>> > */
> >>> > +      flags |= AARCH64_FL_XS;
> >>> > +    }
> >>> > +done:
> >>> > +
> >>> > +  const struct arch_to_arch_name* a_to_an;
> >>> > +  for (a_to_an = all_architectures;
> >>> > +       a_to_an->arch != aarch64_no_arch;
> >>> > +       a_to_an++)
> >>> > +    {
> >>> > +      if (a_to_an->arch == arch)
> >>> > +       break;
> >>> > +    }
> >>> > +
> >>> > +  std::string outstr = a_to_an->arch_name
> >>> > +       + aarch64_get_extension_string_for_isa_flags (flags, 
> >>> > a_to_an->flags);
> >>> > +
> >>> > +  return outstr;
> >>> > +}
> >>> 
> >>> I was hoping we could do this in a table-driven way.  Experimenting
> >>> a bit locally (but only lightly tested), the following seems to work:
> >>> 
> >>> aarch64.h:
> >>> 
> >>> /* The set of all architecture flags.  */
> >>> constexpr auto AARCH64_FL_ARCHES ATTRIBUTE_UNUSED = aarch64_feature_flags 
> >>> (0)
> >>> #define AARCH64_ARCH(A, B, ARCH_IDENT, D, E) \
> >>>   | feature_deps::ARCH_IDENT ().flag
> >>> #include "config/aarch64/aarch64-arches.def"
> >>> ;
> >>> 
> >>> aarch64-common.cc:
> >>> 
> >>> ...
> >>>   const struct arch_to_arch_name *best = nullptr;
> >>>   for (auto *a_to_an = all_architectures;
> >>>        a_to_an->arch != aarch64_no_arch;
> >>>        a_to_an++)
> >>>     {
> >>>       /* Require the architecture to have all architecture flags in 
> >>> FLAGS.  */
> >>>       if ((~a_to_an->flags & flags & AARCH64_FL_ARCHES) != 0)
> >>>   continue;
> >>> 
> >>>       /* Skip architectures that add no new mandatory features.  */
> >>>       if (best && (a_to_an->flags & ~best->flags & ~AARCH64_FL_ARCHES) == 
> >>> 0)
> >>>   continue;
> >>> 
> >>>       /* Require FLAGS to include all mandatory extensions.  */
> >>>       if ((a_to_an->flags & ~flags & ~AARCH64_FL_ARCHES) != 0)
> >>>         continue;
> >>> 
> >>>       best = a_to_an;
> >>>     }
> >>
> >> There are some hypothetical cases in which your suggested approach 
> >> wouldn't be
> >> able to avoid the new feature flag, whereas my more targetted approach here
> >> would.  I'm struggling to think of a realistic example though, as I can 
> >> only
> >> think of one core in which any of these features have been backported to an
> >> earlier architecture version.  The closest bad examples I can find are:
> >>
> >> - Using -mcpu=a64fx+nosve: This would canonicalise to armv8.2-a+f16+fcma 
> >> with
> >>   either approach.  If jscvt didn't exist, however, then my approach would
> >>   instead give armv8.3-a+nopauth+norcpc.
> >> - Running on an unrecognised future cpu with an unfortunate combination of
> >>   features could also be an example, but that's probably unrealistic.
> >
> > Thanks for considering these cases.  But like you say, the first one
> > is counterfactual, in that we do have jscvt.  And personally, I think
> > it's less surprising to add +fcma in that case anyway, rather than have
> > +nosve bump the architecture version from armv8.2-a to armv8.3-a.
> > It seems odd for +no to increase the architecture level.
> >
> > So IMO, the fact that we use the "correct" base architecture is a plus
> > rather than a minus.  This is more about choosing between multiple ways
> > of telling a direct truth, rather than trying to be indirect.
> >
> >> I also feel that this is a work-around that we should be applying in 
> >> moderation
> >> only when we know it would actually help with compatibility issues.
> >
> > I can see that.  But I think it's easier to reason about, and less
> > surprising, if the behaviour stems from easily-understood rules,
> > rather than being a list of special cases.
> >
> > Also, I think replacing -march=armv8-a+all+the+features+for+v8.8-a
> > with -march=armv8.8-a is an independent good.  So...
> >
> >> However, if you'd still prefer to use a bigger table-driven hammer, then I 
> >> can
> >> change the patch to do that.
> >
> > ...yeah, I would still prefer that, sorry. :)
> 
> To recap, the original problem that we're trying to solve is:
> 
>   In addition to Andrew P's comment about documentation, doesn't this
>   mean that -mcpu=native will now emit +fcma .arch strings for
>   unrecognised CPUs (i.e. those for which we can't establish a
>   baseline beyond Armv8-A?).  E.g., I think:
> 
>   processor   : 0
>   BogoMIPS    : 100.00
>   Features        : fp asimd atomics crc32 asimdrdm paca pacg lrcpc fcma
>   CPU implementer     : 0xaa
>   CPU architecture: 8
>   CPU variant : 0xaa
>   CPU part    : 0xaa
>   CPU revision        : 0
> 
>   should enable all the Armv8.3-A features that GCC is aware of after
>   this patch, but we emit:
> 
>           .arch armv8-a+lse+rdma+crc+fcma+rcpc+pauth
> 
>   rather than:
> 
>           .arch armv8.3-a
> 
>   And that could be a problem because binutils support for the +fcma
>   name is relatively recent (your patch from January this year).
>   Assembling with older versions of gas is likely to fail, regardless
>   of whether the code uses FCMA.
> 
>   I think we might need to adjust the driver code so that it tries
>   to consolidate features into an architecture level where possible.
> 
> However, Andrew C pointed out off-list that this wouldn't solve
> the problem for Armv8.5-A and above, because predres is a mandatory
> feature for Armv8.5-A, but it does not have a /proc/cpuinfo identifier.
> We'd therefore never go beyond armv8.5-a and would add an explicit
> +flagm2 (etc.) for Armv8.6-A and above, or for Armv9-A and above.
> 
> And Armv8.5-A+ and Armv9-A+ are likely to be the only cases that
> matter, in that it's relatively unlikely that new CPUs would be
> below those baselines.
> 
> One option would have been to limit the procedure above to flags
> that have /proc/cpuinfo identifiers, but that makes the dangerous
> assumption that features without /proc/cpuinfo identifiers would
> never be used by the compiler (only directly via intrinsics or asm).
> 
> Another alternative would be to go back to the original patch,
> that sometimes goes to a higher architecture level and then
> adds +no options.  But that feels dangerous too.  Going back to
> the example above, it seems wrong in principle to emit
> -march=armv8.3-a+stuff for A64FX.
> 
> So after all that, I suppose we're just going to have to accept that
> anyone using -mcpu=native with GCC15+ on a target that GCC doesn't
> recognise will need to use binutils 2.42+.  It would be worth a mention
> in the release notes, under the "Caveats" section.
> 
> Does anyone disagree?
> 
> Thanks,
> Richard

I've pushed the first 9 patches in the series to master.  I'll also repost a
refactored version of this patch 10/10 with the extra canonicalisation
removed - that will be a good cleanup and will be useful if we want to try
tweaking the assembler target strings in any other way.

Reply via email to