Andrew Carlotti <andrew.carlo...@arm.com> writes:
> On Mon, Nov 25, 2024 at 11:26:39PM +0000, Richard Sandiford wrote:
>> Sorry for the slow review.
>> 
>> Andrew Carlotti <andrew.carlo...@arm.com> writes:
>> > These new flags (+fcma, +jscvt, +rcpc2, +jscvt, +frintts, +wfxt and +xs)
>> > were only recently added to the assembler.  To improve compatibility
>> > with older assemblers, we try to avoid passing these new flags to the
>> > assembler if we can express the targetted architecture without them. We
>> > do so by using an almost-equivalent architecture string with a higher
>> > architecture version.
>> >
>> > This should never reduce the set of instructions accepted by the
>> > assembler.  It will make it more lenient in two cases:
>> >
>> > 1. Many system registers are currently gated behind architecture
>> > versions instead of specific feature flags.  Increasing the base
>> > architecture version may cause more system register accesses to be
>> > accepted.
>> >
>> > 2. FEAT_XS doesn't have an HWCAP bit or cpuinfo entry.  We still want to
>> > avoid passing +wfxt or +noxs to the assembler if possible, so we'll
>> > instruct the assembler to accept FEAT_XS instructions as well whenever
>> > the rest of the new features are enabled.
>> >
>> > gcc/ChangeLog:
>> >
>> >    * common/config/aarch64/aarch64-common.cc
>> >    (aarch64_get_arch_string_for_assembler): New.
>> >    (aarch64_rewrite_march): New.
>> >    (aarch64_rewrite_selected_cpu): Call new function.
>> >    * config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping.
>> >    * config/aarch64/aarch64-protos.h
>> >    (aarch64_get_arch_string_for_assembler): New.
>> >    * config/aarch64/aarch64.cc
>> >    (aarch64_declare_function_name): Call new function.
>> >    (aarch64_start_file): Ditto.
>> >    * config/aarch64/aarch64.h
>> >    * config/aarch64/aarch64.h
>> >    (EXTRA_SPEC_FUNCTIONS): Use new macro name.
>> >    (MCPU_TO_MARCH_SPEC): Rename to...
>> >    (MARCH_REWRITE_SPEC): ...this, and add new spec rule.
>> >    (aarch64_rewrite_march): New declaration.
>> >    (MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
>> >    (MARCH_REWRITE_SPEC_FUNCTIONS): ...this, and add new function.
>> >    (ASM_CPU_SPEC): Use new macro name.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >    * gcc.target/aarch64/cpunative/native_cpu_21.c: Update check.
>> >    * gcc.target/aarch64/cpunative/native_cpu_22.c: Update check.
>> >    * gcc.target/aarch64/cpunative/info_27: New test.
>> >    * gcc.target/aarch64/cpunative/info_28: New test.
>> >    * gcc.target/aarch64/cpunative/info_29: New test.
>> >    * gcc.target/aarch64/cpunative/native_cpu_27.c: New test.
>> >    * gcc.target/aarch64/cpunative/native_cpu_28.c: New test.
>> >    * gcc.target/aarch64/cpunative/native_cpu_29.c: New test.
>> >
>> >
>> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
>> > b/gcc/common/config/aarch64/aarch64-common.cc
>> > index 
>> > 2bfc597e333b6018970a9ee6e370a66b6d0960ef..717b3238be16f39a6fd1b4143662eb540ccf292d
>> >  100644
>> > --- a/gcc/common/config/aarch64/aarch64-common.cc
>> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
>> > @@ -371,6 +371,119 @@ aarch64_get_extension_string_for_isa_flags
>> >    return outstr;
>> >  }
>> >  
>> > +/* Generate an arch string to be passed to the assembler.
>> > +
>> > +   Several flags were added retrospectively for features that were 
>> > previously
>> > +   enabled only by specifying an architecture version.  We want to avoid
>> > +   passing these flags to the assembler if possible, to improve 
>> > compatibility
>> > +   with older assemblers.  */
>> > +
>> > +std::string
>> > +aarch64_get_arch_string_for_assembler (aarch64_arch arch,
>> > +                                 aarch64_feature_flags flags)
>> > +{
>> > +  if (!(flags & AARCH64_FL_FCMA) || !(flags & AARCH64_FL_JSCVT))
>> > +    goto done;
>> > +
>> > +  if (arch == AARCH64_ARCH_V8A
>> > +      || arch == AARCH64_ARCH_V8_1A
>> > +      || arch == AARCH64_ARCH_V8_2A)
>> > +    arch = AARCH64_ARCH_V8_3A;
>> > +
>> > +  if (!(flags & AARCH64_FL_RCPC2))
>> > +    goto done;
>> > +
>> > +  if (arch == AARCH64_ARCH_V8_3A)
>> > +    arch = AARCH64_ARCH_V8_4A;
>> > +
>> > +  if (!(flags & AARCH64_FL_FRINTTS) || !(flags & AARCH64_FL_FLAGM2))
>> > +    goto done;
>> > +
>> > +  if (arch == AARCH64_ARCH_V8_4A)
>> > +    arch = AARCH64_ARCH_V8_5A;
>> > +
>> > +  if (!(flags & AARCH64_FL_WFXT))
>> > +    goto done;
>> > +
>> > +  if (arch == AARCH64_ARCH_V8_5A || arch == AARCH64_ARCH_V8_6A)
>> > +    {
>> > +      arch = AARCH64_ARCH_V8_7A;
>> > +      /* We don't support native detection for FEAT_XS, so we'll assume 
>> > it's
>> > +   present if the rest of these features are also present.  If we don't
>> > +   do this, then we would end up passing +noxs to the assembler.  */
>> > +      flags |= AARCH64_FL_XS;
>> > +    }
>> > +done:
>> > +
>> > +  const struct arch_to_arch_name* a_to_an;
>> > +  for (a_to_an = all_architectures;
>> > +       a_to_an->arch != aarch64_no_arch;
>> > +       a_to_an++)
>> > +    {
>> > +      if (a_to_an->arch == arch)
>> > +  break;
>> > +    }
>> > +
>> > +  std::string outstr = a_to_an->arch_name
>> > +  + aarch64_get_extension_string_for_isa_flags (flags, a_to_an->flags);
>> > +
>> > +  return outstr;
>> > +}
>> 
>> I was hoping we could do this in a table-driven way.  Experimenting
>> a bit locally (but only lightly tested), the following seems to work:
>> 
>> aarch64.h:
>> 
>> /* The set of all architecture flags.  */
>> constexpr auto AARCH64_FL_ARCHES ATTRIBUTE_UNUSED = aarch64_feature_flags (0)
>> #define AARCH64_ARCH(A, B, ARCH_IDENT, D, E) \
>>   | feature_deps::ARCH_IDENT ().flag
>> #include "config/aarch64/aarch64-arches.def"
>> ;
>> 
>> aarch64-common.cc:
>> 
>> ...
>>   const struct arch_to_arch_name *best = nullptr;
>>   for (auto *a_to_an = all_architectures;
>>        a_to_an->arch != aarch64_no_arch;
>>        a_to_an++)
>>     {
>>       /* Require the architecture to have all architecture flags in FLAGS.  
>> */
>>       if ((~a_to_an->flags & flags & AARCH64_FL_ARCHES) != 0)
>>      continue;
>> 
>>       /* Skip architectures that add no new mandatory features.  */
>>       if (best && (a_to_an->flags & ~best->flags & ~AARCH64_FL_ARCHES) == 0)
>>      continue;
>> 
>>       /* Require FLAGS to include all mandatory extensions.  */
>>       if ((a_to_an->flags & ~flags & ~AARCH64_FL_ARCHES) != 0)
>>         continue;
>> 
>>       best = a_to_an;
>>     }
>
> There are some hypothetical cases in which your suggested approach wouldn't be
> able to avoid the new feature flag, whereas my more targetted approach here
> would.  I'm struggling to think of a realistic example though, as I can only
> think of one core in which any of these features have been backported to an
> earlier architecture version.  The closest bad examples I can find are:
>
> - Using -mcpu=a64fx+nosve: This would canonicalise to armv8.2-a+f16+fcma with
>   either approach.  If jscvt didn't exist, however, then my approach would
>   instead give armv8.3-a+nopauth+norcpc.
> - Running on an unrecognised future cpu with an unfortunate combination of
>   features could also be an example, but that's probably unrealistic.

Thanks for considering these cases.  But like you say, the first one
is counterfactual, in that we do have jscvt.  And personally, I think
it's less surprising to add +fcma in that case anyway, rather than have
+nosve bump the architecture version from armv8.2-a to armv8.3-a.
It seems odd for +no to increase the architecture level.

So IMO, the fact that we use the "correct" base architecture is a plus
rather than a minus.  This is more about choosing between multiple ways
of telling a direct truth, rather than trying to be indirect.

> I also feel that this is a work-around that we should be applying in 
> moderation
> only when we know it would actually help with compatibility issues.

I can see that.  But I think it's easier to reason about, and less
surprising, if the behaviour stems from easily-understood rules,
rather than being a list of special cases.

Also, I think replacing -march=armv8-a+all+the+features+for+v8.8-a
with -march=armv8.8-a is an independent good.  So...

> However, if you'd still prefer to use a bigger table-driven hammer, then I can
> change the patch to do that.

...yeah, I would still prefer that, sorry. :)

>> > +
>> > +/* Called by the driver to rewrite a name passed to the -march
>> > +   argument in preparation to be passed to the assembler.  The
>> > +   names passed from the commend line will be in ARGV, we want
>> > +   to use the right-most argument, which should be in
>> > +   ARGV[ARGC - 1].  ARGC should always be greater than 0.  */
>> > +
>> > +const char *
>> > +aarch64_rewrite_march (int argc, const char **argv)
>> > +{
>> > +  gcc_assert (argc);
>> > +  const char *name = argv[argc - 1];
>> > +  std::string original_string (name);
>> > +  std::string extension_str;
>> > +  std::string base_name;
>> > +  size_t extension_pos = original_string.find_first_of ('+');
>> > +
>> > +  /* Strip and save the extension string.  */
>> > +  if (extension_pos != std::string::npos)
>> > +    {
>> > +      base_name = original_string.substr (0, extension_pos);
>> > +      extension_str = original_string.substr (extension_pos,
>> > +                                        std::string::npos);
>> > +    }
>> > +  else
>> > +    {
>> > +      /* No extensions.  */
>> > +      base_name = original_string;
>> > +    }
>> > +
>> > +  const struct arch_to_arch_name* a_to_an;
>> > +  for (a_to_an = all_architectures;
>> > +       a_to_an->arch != aarch64_no_arch;
>> > +       a_to_an++)
>> > +    {
>> > +      if (a_to_an->arch_name == base_name)
>> > +  break;
>> > +    }
>> > +
>> > +  /* We couldn't find that architecture name.  */
>> > +  if (a_to_an->arch == aarch64_no_arch)
>> > +    fatal_error (input_location, "unknown value %qs for %<-march%>", 
>> > name);
>> > +
>> > +  aarch64_feature_flags flags = a_to_an->flags;
>> > +  aarch64_parse_extension (extension_str.c_str (), &flags, NULL);
>> > +
>> > +  std::string outstr = aarch64_get_arch_string_for_assembler 
>> > (a_to_an->arch,
>> > +                                                        flags);
>> > +
>> > +  /* We are going to memory leak here, nobody elsewhere
>> > +     in the callchain is going to clean up after us.  The alternative is
>> > +     to allocate a static buffer, and assert that it is big enough for our
>> > +     modified string, which seems much worse!  */
>> > +  return xstrdup (outstr.c_str ());
>> > +}
>> 
>> This is going to seem like feature creep, sorry, but: rather than
>> duplicate the architecture parsing, could we instead move the march
>> and mcpu processing from aarch64.cc to here?  Specifically:
>> 
>> - aarch64_parse_arch
>> - aarch64_parse_cpu
>> - aarch64_validate_mcpu
>> - aarch64_validate_march
>> - aarch64_print_hint_for_*
>> 
>> This would mean making "struct processor" public, and so giving it
>> an aarch64_ name (or putting it in a namespace).  We'd also need to
>> remove the tuning information and use a separate table for that.
>> Still, I think it would be more robust than having two pieces of code
>> doing the same parsing.  It should also give a better UI, since the
>> driver parsing would give the same hints as the compiler proper.
>> 
>> (Only tested to the point of moving the code and linking the driver.)
>
> This seems like a reasonable improvement, though I do see the "seems like
> feature creep" remark.

I'm happy to do this refactor if that's easier, since I know you have
other urgent things to take care of.

Richard

Reply via email to