Richard Sandiford <richard.sandif...@arm.com> writes:
> Andrew Carlotti <andrew.carlo...@arm.com> writes:
>> On Mon, Nov 25, 2024 at 11:26:39PM +0000, Richard Sandiford wrote:
>>> Sorry for the slow review.
>>> 
>>> Andrew Carlotti <andrew.carlo...@arm.com> writes:
>>> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintts, +wfxt and +xs)
>>> > were only recently added to the assembler.  To improve compatibility
>>> > with older assemblers, we try to avoid passing these new flags to the
>>> > assembler if we can express the targetted architecture without them. We
>>> > do so by using an almost-equivalent architecture string with a higher
>>> > architecture version.
>>> >
>>> > This should never reduce the set of instructions accepted by the
>>> > assembler.  It will make it more lenient in two cases:
>>> >
>>> > 1. Many system registers are currently gated behind architecture
>>> > versions instead of specific feature flags.  Increasing the base
>>> > architecture version may cause more system register accesses to be
>>> > accepted.
>>> >
>>> > 2. FEAT_XS doesn't have an HWCAP bit or cpuinfo entry.  We still want to
>>> > avoid passing +wfxt or +noxs to the assembler if possible, so we'll
>>> > instruct the assembler to accept FEAT_XS instructions as well whenever
>>> > the rest of the new features are enabled.
>>> >
>>> > gcc/ChangeLog:
>>> >
>>> >   * common/config/aarch64/aarch64-common.cc
>>> >   (aarch64_get_arch_string_for_assembler): New.
>>> >   (aarch64_rewrite_march): New.
>>> >   (aarch64_rewrite_selected_cpu): Call new function.
>>> >   * config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping.
>>> >   * config/aarch64/aarch64-protos.h
>>> >   (aarch64_get_arch_string_for_assembler): New.
>>> >   * config/aarch64/aarch64.cc
>>> >   (aarch64_declare_function_name): Call new function.
>>> >   (aarch64_start_file): Ditto.
>>> >   * config/aarch64/aarch64.h
>>> >   * config/aarch64/aarch64.h
>>> >   (EXTRA_SPEC_FUNCTIONS): Use new macro name.
>>> >   (MCPU_TO_MARCH_SPEC): Rename to...
>>> >   (MARCH_REWRITE_SPEC): ...this, and add new spec rule.
>>> >   (aarch64_rewrite_march): New declaration.
>>> >   (MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
>>> >   (MARCH_REWRITE_SPEC_FUNCTIONS): ...this, and add new function.
>>> >   (ASM_CPU_SPEC): Use new macro name.
>>> >
>>> > gcc/testsuite/ChangeLog:
>>> >
>>> >   * gcc.target/aarch64/cpunative/native_cpu_21.c: Update check.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_22.c: Update check.
>>> >   * gcc.target/aarch64/cpunative/info_27: New test.
>>> >   * gcc.target/aarch64/cpunative/info_28: New test.
>>> >   * gcc.target/aarch64/cpunative/info_29: New test.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_27.c: New test.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_28.c: New test.
>>> >   * gcc.target/aarch64/cpunative/native_cpu_29.c: New test.
>>> >
>>> >
>>> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
>>> > b/gcc/common/config/aarch64/aarch64-common.cc
>>> > index 
>>> > 2bfc597e333b6018970a9ee6e370a66b6d0960ef..717b3238be16f39a6fd1b4143662eb540ccf292d
>>> >  100644
>>> > --- a/gcc/common/config/aarch64/aarch64-common.cc
>>> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
>>> > @@ -371,6 +371,119 @@ aarch64_get_extension_string_for_isa_flags
>>> >    return outstr;
>>> >  }
>>> >  
>>> > +/* Generate an arch string to be passed to the assembler.
>>> > +
>>> > +   Several flags were added retrospectively for features that were 
>>> > previously
>>> > +   enabled only by specifying an architecture version.  We want to avoid
>>> > +   passing these flags to the assembler if possible, to improve 
>>> > compatibility
>>> > +   with older assemblers.  */
>>> > +
>>> > +std::string
>>> > +aarch64_get_arch_string_for_assembler (aarch64_arch arch,
>>> > +                                aarch64_feature_flags flags)
>>> > +{
>>> > +  if (!(flags & AARCH64_FL_FCMA) || !(flags & AARCH64_FL_JSCVT))
>>> > +    goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8A
>>> > +      || arch == AARCH64_ARCH_V8_1A
>>> > +      || arch == AARCH64_ARCH_V8_2A)
>>> > +    arch = AARCH64_ARCH_V8_3A;
>>> > +
>>> > +  if (!(flags & AARCH64_FL_RCPC2))
>>> > +    goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8_3A)
>>> > +    arch = AARCH64_ARCH_V8_4A;
>>> > +
>>> > +  if (!(flags & AARCH64_FL_FRINTTS) || !(flags & AARCH64_FL_FLAGM2))
>>> > +    goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8_4A)
>>> > +    arch = AARCH64_ARCH_V8_5A;
>>> > +
>>> > +  if (!(flags & AARCH64_FL_WFXT))
>>> > +    goto done;
>>> > +
>>> > +  if (arch == AARCH64_ARCH_V8_5A || arch == AARCH64_ARCH_V8_6A)
>>> > +    {
>>> > +      arch = AARCH64_ARCH_V8_7A;
>>> > +      /* We don't support native detection for FEAT_XS, so we'll assume 
>>> > it's
>>> > +  present if the rest of these features are also present.  If we don't
>>> > +  do this, then we would end up passing +noxs to the assembler.  */
>>> > +      flags |= AARCH64_FL_XS;
>>> > +    }
>>> > +done:
>>> > +
>>> > +  const struct arch_to_arch_name* a_to_an;
>>> > +  for (a_to_an = all_architectures;
>>> > +       a_to_an->arch != aarch64_no_arch;
>>> > +       a_to_an++)
>>> > +    {
>>> > +      if (a_to_an->arch == arch)
>>> > + break;
>>> > +    }
>>> > +
>>> > +  std::string outstr = a_to_an->arch_name
>>> > + + aarch64_get_extension_string_for_isa_flags (flags, a_to_an->flags);
>>> > +
>>> > +  return outstr;
>>> > +}
>>> 
>>> I was hoping we could do this in a table-driven way.  Experimenting
>>> a bit locally (but only lightly tested), the following seems to work:
>>> 
>>> aarch64.h:
>>> 
>>> /* The set of all architecture flags.  */
>>> constexpr auto AARCH64_FL_ARCHES ATTRIBUTE_UNUSED = aarch64_feature_flags 
>>> (0)
>>> #define AARCH64_ARCH(A, B, ARCH_IDENT, D, E) \
>>>   | feature_deps::ARCH_IDENT ().flag
>>> #include "config/aarch64/aarch64-arches.def"
>>> ;
>>> 
>>> aarch64-common.cc:
>>> 
>>> ...
>>>   const struct arch_to_arch_name *best = nullptr;
>>>   for (auto *a_to_an = all_architectures;
>>>        a_to_an->arch != aarch64_no_arch;
>>>        a_to_an++)
>>>     {
>>>       /* Require the architecture to have all architecture flags in FLAGS.  
>>> */
>>>       if ((~a_to_an->flags & flags & AARCH64_FL_ARCHES) != 0)
>>>     continue;
>>> 
>>>       /* Skip architectures that add no new mandatory features.  */
>>>       if (best && (a_to_an->flags & ~best->flags & ~AARCH64_FL_ARCHES) == 0)
>>>     continue;
>>> 
>>>       /* Require FLAGS to include all mandatory extensions.  */
>>>       if ((a_to_an->flags & ~flags & ~AARCH64_FL_ARCHES) != 0)
>>>         continue;
>>> 
>>>       best = a_to_an;
>>>     }
>>
>> There are some hypothetical cases in which your suggested approach wouldn't 
>> be
>> able to avoid the new feature flag, whereas my more targetted approach here
>> would.  I'm struggling to think of a realistic example though, as I can only
>> think of one core in which any of these features have been backported to an
>> earlier architecture version.  The closest bad examples I can find are:
>>
>> - Using -mcpu=a64fx+nosve: This would canonicalise to armv8.2-a+f16+fcma with
>>   either approach.  If jscvt didn't exist, however, then my approach would
>>   instead give armv8.3-a+nopauth+norcpc.
>> - Running on an unrecognised future cpu with an unfortunate combination of
>>   features could also be an example, but that's probably unrealistic.
>
> Thanks for considering these cases.  But like you say, the first one
> is counterfactual, in that we do have jscvt.  And personally, I think
> it's less surprising to add +fcma in that case anyway, rather than have
> +nosve bump the architecture version from armv8.2-a to armv8.3-a.
> It seems odd for +no to increase the architecture level.
>
> So IMO, the fact that we use the "correct" base architecture is a plus
> rather than a minus.  This is more about choosing between multiple ways
> of telling a direct truth, rather than trying to be indirect.
>
>> I also feel that this is a work-around that we should be applying in 
>> moderation
>> only when we know it would actually help with compatibility issues.
>
> I can see that.  But I think it's easier to reason about, and less
> surprising, if the behaviour stems from easily-understood rules,
> rather than being a list of special cases.
>
> Also, I think replacing -march=armv8-a+all+the+features+for+v8.8-a
> with -march=armv8.8-a is an independent good.  So...
>
>> However, if you'd still prefer to use a bigger table-driven hammer, then I 
>> can
>> change the patch to do that.
>
> ...yeah, I would still prefer that, sorry. :)

To recap, the original problem that we're trying to solve is:

  In addition to Andrew P's comment about documentation, doesn't this
  mean that -mcpu=native will now emit +fcma .arch strings for
  unrecognised CPUs (i.e. those for which we can't establish a
  baseline beyond Armv8-A?).  E.g., I think:

  processor     : 0
  BogoMIPS      : 100.00
  Features        : fp asimd atomics crc32 asimdrdm paca pacg lrcpc fcma
  CPU implementer       : 0xaa
  CPU architecture: 8
  CPU variant   : 0xaa
  CPU part      : 0xaa
  CPU revision  : 0

  should enable all the Armv8.3-A features that GCC is aware of after
  this patch, but we emit:

          .arch armv8-a+lse+rdma+crc+fcma+rcpc+pauth

  rather than:

          .arch armv8.3-a

  And that could be a problem because binutils support for the +fcma
  name is relatively recent (your patch from January this year).
  Assembling with older versions of gas is likely to fail, regardless
  of whether the code uses FCMA.

  I think we might need to adjust the driver code so that it tries
  to consolidate features into an architecture level where possible.

However, Andrew C pointed out off-list that this wouldn't solve
the problem for Armv8.5-A and above, because predres is a mandatory
feature for Armv8.5-A, but it does not have a /proc/cpuinfo identifier.
We'd therefore never go beyond armv8.5-a and would add an explicit
+flagm2 (etc.) for Armv8.6-A and above, or for Armv9-A and above.

And Armv8.5-A+ and Armv9-A+ are likely to be the only cases that
matter, in that it's relatively unlikely that new CPUs would be
below those baselines.

One option would have been to limit the procedure above to flags
that have /proc/cpuinfo identifiers, but that makes the dangerous
assumption that features without /proc/cpuinfo identifiers would
never be used by the compiler (only directly via intrinsics or asm).

Another alternative would be to go back to the original patch,
that sometimes goes to a higher architecture level and then
adds +no options.  But that feels dangerous too.  Going back to
the example above, it seems wrong in principle to emit
-march=armv8.3-a+stuff for A64FX.

So after all that, I suppose we're just going to have to accept that
anyone using -mcpu=native with GCC15+ on a target that GCC doesn't
recognise will need to use binutils 2.42+.  It would be worth a mention
in the release notes, under the "Caveats" section.

Does anyone disagree?

Thanks,
Richard

Reply via email to