Re: [PATCH 06/10] testsuite: aarch64: arm: Add -mfpu=auto to arm_v8_2a_bf16_neon_ok

Christophe Lyon Tue, 25 Mar 2025 06:14:06 -0700

On Mon, 24 Mar 2025 at 16:13, Richard Earnshaw (lists)
<richard.earns...@arm.com> wrote:
>
> On 24/03/2025 14:52, Christophe Lyon wrote:
> > On Mon, 24 Mar 2025 at 15:13, Richard Earnshaw (lists)
> > <richard.earns...@arm.com> wrote:
> >>
> >> On 21/03/2025 17:30, Christophe Lyon wrote:
> >>> On Fri, 21 Mar 2025 at 16:51, Richard Earnshaw (lists)
> >>> <richard.earns...@arm.com> wrote:
> >>>>
> >>>> On 21/03/2025 15:15, Christophe Lyon wrote:
> >>>>> On Fri, 21 Mar 2025 at 15:25, Richard Earnshaw (lists)
> >>>>> <richard.earns...@arm.com> wrote:
> >>>>>>
> >>>>>> On 21/03/2025 14:05, Christophe Lyon wrote:
> >>>>>>> On Fri, 21 Mar 2025 at 11:18, Richard Earnshaw (lists)
> >>>>>>> <richard.earns...@arm.com> wrote:
> >>>>>>>>
> >>>>>>>> On 20/03/2025 16:15, Christophe Lyon wrote:
> >>>>>>>>> Depending on if/how the testing flags are overridden, the first 
> >>>>>>>>> value
> >>>>>>>>> we try("") might not do what we want.
> >>>>>>>>>
> >>>>>>>>> For instance, if the whole testsuite is executed with
> >>>>>>>>> (A) -mthumb -march=armv7-m -mtune=cortex-m3 -mfloat-abi=softfp
> >>>>>>>>>
> >>>>>>>>> bf16_neon_ok is first compiled with
> >>>>>>>>> (A) (B)
> >>>>>>>>> where B = -mcpu=unset -march=armv8.2-a+bf16
> >>>>>>>>>
> >>>>>>>>> which is accepted, so a testcase like vld2q_lane_bf16_indices_1.c
> >>>>>>>>> is compiled with:
> >>>>>>>>> (A) (C) (B)
> >>>>>>>>> where C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a 
> >>>>>>>>> -mfpu=neon-fp16 -mfp16-format=ieee
> >>>>>>>>>
> >>>>>>>>> because advsimd-intrinsics.exp has set additional_flags to (C)
> >>>>>>>>> via arm_neon_fp16_ok
> >>>>>>>>>
> >>>>>>>>> So the testcase is compiled with
> >>>>>>>>> [...] -mfpu=neon-fp16 -mcpu=unset -march=armv8.2-a+bf16
> >>>>>>>>> (thus -mfpu=neon-fp16) and bf16 support is disabled.
> >>>>>>>>>
> >>>>>>>>> The patch replaces "" with -mfpu=auto which matches the intended
> >>>>>>>>> effect of -march=armv8.2-a+bf16 as added by bf16_neon_ok, and the
> >>>>>>>>> testcase is now compiled with
> >>>>>>>>> (A) (C) -mfpu=auto (B)
> >>>>>>>>>
> >>>>>>>>> However, since this effective-target is also used on aarch64 (which
> >>>>>>>>> does not support -mfpu=auto), we do this only on arm.
> >>>>>>>>>
> >>>>>>>>> This patch improves coverage, and makes
> >>>>>>>>> v{ld,st}[234]q_lane_bf16_indices_1.c pass when testsuite flags are
> >>>>>>>>> overridden as described above (e.g. for M-profile).
> >>>>>>>>>
> >>>>>>>>>       gcc/testsuite/
> >>>>>>>>>       * lib/target-supports.exp
> >>>>>>>>>       (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache):
> >>>>>>>>>       Conditionally use -mfpu=auto.
> >>>>>>>>> ---
> >>>>>>>>>  gcc/testsuite/lib/target-supports.exp | 9 ++++++++-
> >>>>>>>>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/gcc/testsuite/lib/target-supports.exp 
> >>>>>>>>> b/gcc/testsuite/lib/target-supports.exp
> >>>>>>>>> index e2622a445c5..09b16a14024 100644
> >>>>>>>>> --- a/gcc/testsuite/lib/target-supports.exp
> >>>>>>>>> +++ b/gcc/testsuite/lib/target-supports.exp
> >>>>>>>>> @@ -6871,12 +6871,19 @@ proc add_options_for_arm_fp16fml_neon { 
> >>>>>>>>> flags } {
> >>>>>>>>>  proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } {
> >>>>>>>>>      global et_arm_v8_2a_bf16_neon_flags
> >>>>>>>>>      set et_arm_v8_2a_bf16_neon_flags ""
> >>>>>>>>> +    set fpu_auto ""
> >>>>>>>>>
> >>>>>>>>>      if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> >>>>>>>>>       return 0;
> >>>>>>>>>      }
> >>>>>>>>>
> >>>>>>>>> -    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
> >>>>>>>>> "-mfloat-abi=hard -mfpu=neon-fp-armv8" } {
> >>>>>>>>> +    if { [istarget arm*-*-*] } {
> >>>>>>>>> +     set fpu_auto "-mfpu=auto"
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>> +    foreach flags [list "$fpu_auto" \
> >>>>>>>>
> >>>>>>>> Shouldn't we try first with "", even on Arm?  Thus
> >>>>>>>>        foreach flags [list "" "$fpu_auto" \
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>> I don't think so, that's why I tried to explain above.
> >>>>>>> "" is acceptable / accepted in arm_v8_2a_bf16_neon_ok
> >>>>>>> (this is (A) (B) above, where the important parts are:
> >>>>>>> -march=armv7-m -mcpu=unset -march=armv8.2-a+bf16
> >>>>>>> (so -mfpu is set to the toolchain's default)
> >>>>>>
> >>>>>> That's never going to work reliably.  We need to check, somewhere, the 
> >>>>>> full set of options we intend to pass to the compilation.  We can't 
> >>>>>> assume that separately testing if A is ok and B is ok => A + B is ok.
> >>>>>>
> >>>>>
> >>>>> Hmmm I think I raised that problem years ago, because of the way the
> >>>>> test system is designed...
> >>>>>
> >>>>>>>
> >>>>>>> but then the actual testcase is compiled with additional flags (C)
> >>>>>>> defined by the test driver using arm_neon_fp16_ok
> >>>>>>> C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a
> >>>>>>> -mfpu=neon-fp16 -mfp16-format=ieee
> >>>>>>>
> >>>>>>> so the relevant parts of (A) (C) (B) are:
> >>>>>>> -march=armv7-m  -mfpu=neon -mcpu=unset -march=armv7-a -mfpu=neon-fp16
> >>>>>>> -mcpu=unset -march=armv8.2-a+bf16
> >>>>>>> which can be simplified into
> >>>>>>> -mfpu=neon-fp16 -march=armv8.2-a+bf16
> >>>>>>>
> >>>>>>> I think we need to start with -mfpu=auto instead of "", so that when
> >>>>>>> -march=armv8.2-a+bf16 takes effect, we have cancelled any other -mfpu.
> >>>>>>>
> >>>>>>
> >>>>>> Ideally a test shouldn't add any options in some test runs; that way 
> >>>>>> we add some 'base configuration' coverage.  We obviously can't do that 
> >>>>>> everywhere and there may still be cases where the system can't test 
> >>>>>> anything useful at all (hopefully only when linking, or more is 
> >>>>>> needed).
> >>>>>>
> >>>>>
> >>>>> Not sure to follow? If a test wants to check that a given feature
> >>>>> works as expected, it has to use the appropriate options, doesn't it?
> >>>>
> >>>> Not if the base configuration for the run (from the compiler config or 
> >>>> from the RUNTEST flags) already supports that feature.
> >>>>
> >>>
> >>> OK, but isn't this error-prone?
> >>
> >> This whole overriding options thing is error prone, unfortunately.  I've 
> >> sometimes wondered if we should have some directories in the testsuite 
> >> where we just suppress all options passed by the user to the framework (so 
> >> we only need to consider the options configured into the compiler itself). 
> >>  But even then we have to think about header file compatibility, so I'm 
> >> not sure even that would make things much better.  Also, such tests would 
> >> have to be limited to compile only (we can't assume the right libs will 
> >> exist for linking, much less running the code).
> >>
> >>
> >>> The effective-target unconditionally adds -mcpu=unset -march=XXX
> >>> already, so relying on the implicit default value of -mcpu seems
> >>> inconsistent?
> >>
> >> So perhaps we just have the wrong effective target?
> >>
> > If we consider the one this patch modifies: arm_v8_2a_bf16_neon_ok
> > AFAIU, it looks for the right set of flags needed to have
> > __ARM_FEATURE_BF16_VECTOR_ARITHMETIC defined,
> > so it tries -mfloat-abi / -mfpu combinations in addition to
> > -mcpu=unset -march=armv8.2-a+bf16
> > It seems consistent with the other numerous effective targets we have?
> >
>
> Right, but testing to see if __ARM_FEATURE_BF16_VECTOR_ARITHMETIC is defined 
> already (with no overrides) should be done first.  Otherwise we might as well 
> abandon pretty much all of the options and override everything straight away.
>


Agreed, I'm not sure if testing with "" is useful at all. AFAIU, all
the tests which use require-effectve-target ensure that a given
feature is available, and possibly add some flags to ensure that.
They don't care how this is achieved (whether with "" or with
"-mfloat-abi=XXX -mfpu=YYY"), they just rely on the feature being
enabled.  We still have to detect whether -mfloat-abi=XXX is needed in
case the toolchain does not support all of soft/softfp/hard though.

> >
> >>> The would also lead to confusion when comparing two logs with the
> >>> exact same options, but different result because the toolchain was
> >>> built with a different --with-fpu= option (which IIRC does not appear
> >>> in gcc.log).
> >>>
> >>> Not sure what accepting "" rather than -mfpu=auto gives us?  Given we
> >>> are checking which flags (-mfloat-abi for instance) are need to
> >>> support -march=XXX, don't we want to make sure the corresponding fpu
> >>> is in use?
> >>
> >> See below.
> >>
> >>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> +                    "-mfloat-abi=softfp -mfpu=neon-fp-armv8" \
> >>>>>>>>> +                    "-mfloat-abi=hard -mfpu=neon-fp-armv8" ] {
> >>>>>>>>>       if { [check_no_compiler_messages_nocache 
> >>>>>>>>> arm_v8_2a_bf16_neon_ok object {
> >>>>>>>>>           #include <arm_neon.h>
> >>>>>>>>>           #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
> >>>>>>>>
> >>>>>>>> In fact, since aarch64 doesn't support -mfpu at all, only "" is ever 
> >>>>>>>> going to work here, so perhaps we can recast all this code, perhaps 
> >>>>>>>> along the lines of that in 
> >>>>>>>> check_effective_target_arm_neon_h_ok_nocache so that we don't need 
> >>>>>>>> to actually try to include arm_neon.h at all while figuring out the 
> >>>>>>>> options.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Maybe.
> >>>>>>>
> >>>>>>> OTOH, actually including arm_neon.h is guaranteed to give the right
> >>>>>>> answer to the question "which flags do we need to be able to include
> >>>>>>> arm_neon.h?", rather than using several conditions that are supposed
> >>>>>>> to match what arm_neon.h does?
> >>>>>>>
> >>>>>>
> >>>>>> Working out whether arm_neon.h can be included can be done with 
> >>>>>> arm_neon_h_ok.  That's currently Arm only, but it's trivial to extend 
> >>>>>> it to aarch64.  Once we have that, we can write some rules that build 
> >>>>>> on that base to get other features that we might desire.  But we must, 
> >>>>>> at some point check, rather than assume, that combining options from 
> >>>>>> different rules will result in something sane.
> >>>>>>
> >>>>>
> >>>>> I suspect most tests which currently use arm*neon* effective targets
> >>>>> actually include arm_neon.h, they should arm_neon_h_ok.
> >>>>>
> >>>>> But I'm not sure what we can do to make sure, rather than assume that
> >>>>> A + C + B does what we want, given that:
> >>>>> - A is defined by the user who runs the testsuite (via RUNTESTFLAGS /
> >>>>> target_board)
> >>>>> - C is defined by the test harness (advsimd-intrinsics.exp)
> >>>>> - B is defined as needed by individual tests
> >>>>>
> >>>>> At least we have no control on A.
> >>>>
> >>>> We have no control of A, but any require_... tests will add A to the 
> >>>> option list they test; so we can override them if needed and check the 
> >>>> override works.
> >>>
> >>> Yes, that's already what currently happens.
> >>>
> >>>> C is more complex - we perhaps need some bespoke checks in 
> >>>> advsimd-intrinsics.exp that can be used to ensure that it's base flags 
> >>>> are added to any tests we run.  I need to think about that one a bit; 
> >>>> perhaps we can arrange to add these flags to A before the individual 
> >>>> tests run their additional checks.
> >>>>
> >>> what is the advantage compared to replacing "" with -mfpu=auto, which
> >>> seems simpler?
> >>
> >> Consider
> >>
> >>  -mcpu=cortex-a53 -march=armv8-a+crc -mfpu=neon-fp-armv8 -mfloat-abi=softfp
> >>
> > where would that list come from? Is that user-supplied overrides, or a
> > concatenation of effective targets?
>
> Yes, user-supplied.  It's based on (though not identical to) an option set 
> Torbjorn tests for his m-profile devices.  I guess it could also come from 
> the compiler-configured defaults as well, though:  --with-cpu=cortex-a53 
> --with-arch=armv8-a+crc --with-fpu=neon-fp-armv8 --with-float=softfp would 
> have the same effect.
>
> >
> >> which works just fine.
> >>
> >> If you add -mfpu=auto (overriding the -mfpu=noen-fp-armv8) to that you get
> >>
> >> cc1: warning: switch ‘-mcpu=cortex-a53’ conflicts with switch 
> >> ‘-march=armv8-a+crc’
> >>
> >> because the underlying arch flags don't match any more.
> >
> > If -mtpu=auto would be added by an effective target (like in this
> > patch for instance),
> > it would also add -mcpu=unset before the relevant -march, so that
> > particular sequence shouldn't be generated?
>
> Adding -mfpu=auto when we also add/override the CPU/arch settings is one 
> thing (since we're saying that those options will be providing the right 
> information about the FPU).  It's also the way ETs like arm_v8a_hard work.  
> But the more I think about it, the more I'm sure that just using it on its 
> own is not the right way forward.
>
In this case it's not on its own (or I misunderstand what you mean):
we always add -mcpu=unset -march=armv8.2-a+bf16, so we'd effectively
try
-mfpu=auto  -mcpu=unset -march=armv8.2-a+bf16  first.

What do you have in mind as a way forward?

> >
> > -march=armv8-a+crc is added by arm_crc_ok, but it also adds
> > -mcpu=unset, so that would cancel -mcpu=cortex-a53
> > -mfpu=neon-fp-armv8 can be added by several effective targets, all of
> > which add -mcpu=unset already
> > and if -mfpu=auto alone would cause an error, they would iterate and
> > could select -mfpu=neon-fp-armv8 is it works (arm_v8_neon_ok for
> > instance)
> >
>
> Apart from a small set of tests for the legacy -mfpu options, I'd really like 
> to eliminate all uses of -mfpu= (other than auto) from the entire testsuite.  
> But that's a different piece of work and I don't want to conflate that change 
> with what you're doing here if we can avoid it  (though eliminating -mfpu= in 
> this way for this specific rule would be ok if that were the right solution).
>

Do you mean for instance in arm_v8_2a_bf16_neon_ok_nocache, replacing
the existing -mfpu=neon-fp-armv8 with -mfpu=auto?

> >
> >>
> >> It's even worse for -mfloat-abi=hard because that will error:
> >>
> >> cc1: error: ‘-mfloat-abi=hard’: selected architecture lacks an FPU
> >>
> >>
> >>>
> >>>>>
> >>>>> We can create a new directory, where the .exp test harness would not
> >>>>> use additional options, and move tests which need
> >>>>> effective-target-arm-arch* there.
> >>>>> (that is 189 files out of 541 under advsimd-intrinsics/)
> >>>>>
> >>>>
> >>>> Those tests might benefit from a refactor anyway - the common 
> >>>> arm/aarch64 tests should probably live in dg.target/aarch, and only the 
> >>>> aarch64-specific tests should be in dg.target/aarch64.
> >>>>
> >>> Indeed, but isn't this an orthogonal problem?
> >>>
> >>> Thanks,
> >>>
> >>> Christophe
> >>>
> >>>> R.
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Christophe
> >>>>>
> >>>>>
> >>>>>> R.
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Christophe
> >>>>>>>
> >>>>>>>> R.
> >>>>>>
> >>>>
> >>
> >> R.
>
> R.

Re: [PATCH 06/10] testsuite: aarch64: arm: Add -mfpu=auto to arm_v8_2a_bf16_neon_ok

Reply via email to