Re: [PATCH 06/10] testsuite: aarch64: arm: Add -mfpu=auto to arm_v8_2a_bf16_neon_ok

Richard Earnshaw (lists) Mon, 24 Mar 2025 08:51:25 -0700

On 24/03/2025 14:52, Christophe Lyon wrote:
> On Mon, 24 Mar 2025 at 15:13, Richard Earnshaw (lists)
> <richard.earns...@arm.com> wrote:
>>
>> On 21/03/2025 17:30, Christophe Lyon wrote:
>>> On Fri, 21 Mar 2025 at 16:51, Richard Earnshaw (lists)
>>> <richard.earns...@arm.com> wrote:
>>>>
>>>> On 21/03/2025 15:15, Christophe Lyon wrote:
>>>>> On Fri, 21 Mar 2025 at 15:25, Richard Earnshaw (lists)
>>>>> <richard.earns...@arm.com> wrote:
>>>>>>
>>>>>> On 21/03/2025 14:05, Christophe Lyon wrote:
>>>>>>> On Fri, 21 Mar 2025 at 11:18, Richard Earnshaw (lists)
>>>>>>> <richard.earns...@arm.com> wrote:
>>>>>>>>
>>>>>>>> On 20/03/2025 16:15, Christophe Lyon wrote:
>>>>>>>>> Depending on if/how the testing flags are overridden, the first value
>>>>>>>>> we try("") might not do what we want.
>>>>>>>>>
>>>>>>>>> For instance, if the whole testsuite is executed with
>>>>>>>>> (A) -mthumb -march=armv7-m -mtune=cortex-m3 -mfloat-abi=softfp
>>>>>>>>>
>>>>>>>>> bf16_neon_ok is first compiled with
>>>>>>>>> (A) (B)
>>>>>>>>> where B = -mcpu=unset -march=armv8.2-a+bf16
>>>>>>>>>
>>>>>>>>> which is accepted, so a testcase like vld2q_lane_bf16_indices_1.c
>>>>>>>>> is compiled with:
>>>>>>>>> (A) (C) (B)
>>>>>>>>> where C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a 
>>>>>>>>> -mfpu=neon-fp16 -mfp16-format=ieee
>>>>>>>>>
>>>>>>>>> because advsimd-intrinsics.exp has set additional_flags to (C)
>>>>>>>>> via arm_neon_fp16_ok
>>>>>>>>>
>>>>>>>>> So the testcase is compiled with
>>>>>>>>> [...] -mfpu=neon-fp16 -mcpu=unset -march=armv8.2-a+bf16
>>>>>>>>> (thus -mfpu=neon-fp16) and bf16 support is disabled.
>>>>>>>>>
>>>>>>>>> The patch replaces "" with -mfpu=auto which matches the intended
>>>>>>>>> effect of -march=armv8.2-a+bf16 as added by bf16_neon_ok, and the
>>>>>>>>> testcase is now compiled with
>>>>>>>>> (A) (C) -mfpu=auto (B)
>>>>>>>>>
>>>>>>>>> However, since this effective-target is also used on aarch64 (which
>>>>>>>>> does not support -mfpu=auto), we do this only on arm.
>>>>>>>>>
>>>>>>>>> This patch improves coverage, and makes
>>>>>>>>> v{ld,st}[234]q_lane_bf16_indices_1.c pass when testsuite flags are
>>>>>>>>> overridden as described above (e.g. for M-profile).
>>>>>>>>>
>>>>>>>>>       gcc/testsuite/
>>>>>>>>>       * lib/target-supports.exp
>>>>>>>>>       (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache):
>>>>>>>>>       Conditionally use -mfpu=auto.
>>>>>>>>> ---
>>>>>>>>>  gcc/testsuite/lib/target-supports.exp | 9 ++++++++-
>>>>>>>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>>>>>>>>> b/gcc/testsuite/lib/target-supports.exp
>>>>>>>>> index e2622a445c5..09b16a14024 100644
>>>>>>>>> --- a/gcc/testsuite/lib/target-supports.exp
>>>>>>>>> +++ b/gcc/testsuite/lib/target-supports.exp
>>>>>>>>> @@ -6871,12 +6871,19 @@ proc add_options_for_arm_fp16fml_neon { flags 
>>>>>>>>> } {
>>>>>>>>>  proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } {
>>>>>>>>>      global et_arm_v8_2a_bf16_neon_flags
>>>>>>>>>      set et_arm_v8_2a_bf16_neon_flags ""
>>>>>>>>> +    set fpu_auto ""
>>>>>>>>>
>>>>>>>>>      if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
>>>>>>>>>       return 0;
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> -    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
>>>>>>>>> "-mfloat-abi=hard -mfpu=neon-fp-armv8" } {
>>>>>>>>> +    if { [istarget arm*-*-*] } {
>>>>>>>>> +     set fpu_auto "-mfpu=auto"
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    foreach flags [list "$fpu_auto" \
>>>>>>>>
>>>>>>>> Shouldn't we try first with "", even on Arm?  Thus
>>>>>>>>        foreach flags [list "" "$fpu_auto" \
>>>>>>>> ...
>>>>>>>>
>>>>>>> I don't think so, that's why I tried to explain above.
>>>>>>> "" is acceptable / accepted in arm_v8_2a_bf16_neon_ok
>>>>>>> (this is (A) (B) above, where the important parts are:
>>>>>>> -march=armv7-m -mcpu=unset -march=armv8.2-a+bf16
>>>>>>> (so -mfpu is set to the toolchain's default)
>>>>>>
>>>>>> That's never going to work reliably.  We need to check, somewhere, the 
>>>>>> full set of options we intend to pass to the compilation.  We can't 
>>>>>> assume that separately testing if A is ok and B is ok => A + B is ok.
>>>>>>
>>>>>
>>>>> Hmmm I think I raised that problem years ago, because of the way the
>>>>> test system is designed...
>>>>>
>>>>>>>
>>>>>>> but then the actual testcase is compiled with additional flags (C)
>>>>>>> defined by the test driver using arm_neon_fp16_ok
>>>>>>> C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a
>>>>>>> -mfpu=neon-fp16 -mfp16-format=ieee
>>>>>>>
>>>>>>> so the relevant parts of (A) (C) (B) are:
>>>>>>> -march=armv7-m  -mfpu=neon -mcpu=unset -march=armv7-a -mfpu=neon-fp16
>>>>>>> -mcpu=unset -march=armv8.2-a+bf16
>>>>>>> which can be simplified into
>>>>>>> -mfpu=neon-fp16 -march=armv8.2-a+bf16
>>>>>>>
>>>>>>> I think we need to start with -mfpu=auto instead of "", so that when
>>>>>>> -march=armv8.2-a+bf16 takes effect, we have cancelled any other -mfpu.
>>>>>>>
>>>>>>
>>>>>> Ideally a test shouldn't add any options in some test runs; that way we 
>>>>>> add some 'base configuration' coverage.  We obviously can't do that 
>>>>>> everywhere and there may still be cases where the system can't test 
>>>>>> anything useful at all (hopefully only when linking, or more is needed).
>>>>>>
>>>>>
>>>>> Not sure to follow? If a test wants to check that a given feature
>>>>> works as expected, it has to use the appropriate options, doesn't it?
>>>>
>>>> Not if the base configuration for the run (from the compiler config or 
>>>> from the RUNTEST flags) already supports that feature.
>>>>
>>>
>>> OK, but isn't this error-prone?
>>
>> This whole overriding options thing is error prone, unfortunately.  I've 
>> sometimes wondered if we should have some directories in the testsuite where 
>> we just suppress all options passed by the user to the framework (so we only 
>> need to consider the options configured into the compiler itself).  But even 
>> then we have to think about header file compatibility, so I'm not sure even 
>> that would make things much better.  Also, such tests would have to be 
>> limited to compile only (we can't assume the right libs will exist for 
>> linking, much less running the code).
>>
>>
>>> The effective-target unconditionally adds -mcpu=unset -march=XXX
>>> already, so relying on the implicit default value of -mcpu seems
>>> inconsistent?
>>
>> So perhaps we just have the wrong effective target?
>>
> If we consider the one this patch modifies: arm_v8_2a_bf16_neon_ok
> AFAIU, it looks for the right set of flags needed to have
> __ARM_FEATURE_BF16_VECTOR_ARITHMETIC defined,
> so it tries -mfloat-abi / -mfpu combinations in addition to
> -mcpu=unset -march=armv8.2-a+bf16
> It seems consistent with the other numerous effective targets we have?
>


Right, but testing to see if __ARM_FEATURE_BF16_VECTOR_ARITHMETIC is defined 
already (with no overrides) should be done first.  Otherwise we might as well 
abandon pretty much all of the options and override everything straight away.

> 
>>> The would also lead to confusion when comparing two logs with the
>>> exact same options, but different result because the toolchain was
>>> built with a different --with-fpu= option (which IIRC does not appear
>>> in gcc.log).
>>>
>>> Not sure what accepting "" rather than -mfpu=auto gives us?  Given we
>>> are checking which flags (-mfloat-abi for instance) are need to
>>> support -march=XXX, don't we want to make sure the corresponding fpu
>>> is in use?
>>
>> See below.
>>
>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> +                    "-mfloat-abi=softfp -mfpu=neon-fp-armv8" \
>>>>>>>>> +                    "-mfloat-abi=hard -mfpu=neon-fp-armv8" ] {
>>>>>>>>>       if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok 
>>>>>>>>> object {
>>>>>>>>>           #include <arm_neon.h>
>>>>>>>>>           #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
>>>>>>>>
>>>>>>>> In fact, since aarch64 doesn't support -mfpu at all, only "" is ever 
>>>>>>>> going to work here, so perhaps we can recast all this code, perhaps 
>>>>>>>> along the lines of that in 
>>>>>>>> check_effective_target_arm_neon_h_ok_nocache so that we don't need to 
>>>>>>>> actually try to include arm_neon.h at all while figuring out the 
>>>>>>>> options.
>>>>>>>>
>>>>>>>
>>>>>>> Maybe.
>>>>>>>
>>>>>>> OTOH, actually including arm_neon.h is guaranteed to give the right
>>>>>>> answer to the question "which flags do we need to be able to include
>>>>>>> arm_neon.h?", rather than using several conditions that are supposed
>>>>>>> to match what arm_neon.h does?
>>>>>>>
>>>>>>
>>>>>> Working out whether arm_neon.h can be included can be done with 
>>>>>> arm_neon_h_ok.  That's currently Arm only, but it's trivial to extend it 
>>>>>> to aarch64.  Once we have that, we can write some rules that build on 
>>>>>> that base to get other features that we might desire.  But we must, at 
>>>>>> some point check, rather than assume, that combining options from 
>>>>>> different rules will result in something sane.
>>>>>>
>>>>>
>>>>> I suspect most tests which currently use arm*neon* effective targets
>>>>> actually include arm_neon.h, they should arm_neon_h_ok.
>>>>>
>>>>> But I'm not sure what we can do to make sure, rather than assume that
>>>>> A + C + B does what we want, given that:
>>>>> - A is defined by the user who runs the testsuite (via RUNTESTFLAGS /
>>>>> target_board)
>>>>> - C is defined by the test harness (advsimd-intrinsics.exp)
>>>>> - B is defined as needed by individual tests
>>>>>
>>>>> At least we have no control on A.
>>>>
>>>> We have no control of A, but any require_... tests will add A to the 
>>>> option list they test; so we can override them if needed and check the 
>>>> override works.
>>>
>>> Yes, that's already what currently happens.
>>>
>>>> C is more complex - we perhaps need some bespoke checks in 
>>>> advsimd-intrinsics.exp that can be used to ensure that it's base flags are 
>>>> added to any tests we run.  I need to think about that one a bit; perhaps 
>>>> we can arrange to add these flags to A before the individual tests run 
>>>> their additional checks.
>>>>
>>> what is the advantage compared to replacing "" with -mfpu=auto, which
>>> seems simpler?
>>
>> Consider
>>
>>  -mcpu=cortex-a53 -march=armv8-a+crc -mfpu=neon-fp-armv8 -mfloat-abi=softfp
>>
> where would that list come from? Is that user-supplied overrides, or a
> concatenation of effective targets?

Yes, user-supplied.  It's based on (though not identical to) an option set 
Torbjorn tests for his m-profile devices.  I guess it could also come from the 
compiler-configured defaults as well, though:  --with-cpu=cortex-a53 
--with-arch=armv8-a+crc --with-fpu=neon-fp-armv8 --with-float=softfp would have 
the same effect.

> 
>> which works just fine.
>>
>> If you add -mfpu=auto (overriding the -mfpu=noen-fp-armv8) to that you get
>>
>> cc1: warning: switch ‘-mcpu=cortex-a53’ conflicts with switch 
>> ‘-march=armv8-a+crc’
>>
>> because the underlying arch flags don't match any more.
> 
> If -mtpu=auto would be added by an effective target (like in this
> patch for instance),
> it would also add -mcpu=unset before the relevant -march, so that
> particular sequence shouldn't be generated?

Adding -mfpu=auto when we also add/override the CPU/arch settings is one thing 
(since we're saying that those options will be providing the right information 
about the FPU).  It's also the way ETs like arm_v8a_hard work.  But the more I 
think about it, the more I'm sure that just using it on its own is not the 
right way forward.

> 
> -march=armv8-a+crc is added by arm_crc_ok, but it also adds
> -mcpu=unset, so that would cancel -mcpu=cortex-a53
> -mfpu=neon-fp-armv8 can be added by several effective targets, all of
> which add -mcpu=unset already
> and if -mfpu=auto alone would cause an error, they would iterate and
> could select -mfpu=neon-fp-armv8 is it works (arm_v8_neon_ok for
> instance)
> 

Apart from a small set of tests for the legacy -mfpu options, I'd really like 
to eliminate all uses of -mfpu= (other than auto) from the entire testsuite.  
But that's a different piece of work and I don't want to conflate that change 
with what you're doing here if we can avoid it  (though eliminating -mfpu= in 
this way for this specific rule would be ok if that were the right solution).

> 
>>
>> It's even worse for -mfloat-abi=hard because that will error:
>>
>> cc1: error: ‘-mfloat-abi=hard’: selected architecture lacks an FPU
>>
>>
>>>
>>>>>
>>>>> We can create a new directory, where the .exp test harness would not
>>>>> use additional options, and move tests which need
>>>>> effective-target-arm-arch* there.
>>>>> (that is 189 files out of 541 under advsimd-intrinsics/)
>>>>>
>>>>
>>>> Those tests might benefit from a refactor anyway - the common arm/aarch64 
>>>> tests should probably live in dg.target/aarch, and only the 
>>>> aarch64-specific tests should be in dg.target/aarch64.
>>>>
>>> Indeed, but isn't this an orthogonal problem?
>>>
>>> Thanks,
>>>
>>> Christophe
>>>
>>>> R.
>>>>
>>>>> Thanks,
>>>>>
>>>>> Christophe
>>>>>
>>>>>
>>>>>> R.
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Christophe
>>>>>>>
>>>>>>>> R.
>>>>>>
>>>>
>>
>> R.

R.

Re: [PATCH 06/10] testsuite: aarch64: arm: Add -mfpu=auto to arm_v8_2a_bf16_neon_ok

Reply via email to