Re: [PATCH 06/10] testsuite: aarch64: arm: Add -mfpu=auto to arm_v8_2a_bf16_neon_ok

Richard Earnshaw (lists) Mon, 24 Mar 2025 09:14:35 -0700

On 21/03/2025 17:30, Christophe Lyon wrote:
> On Fri, 21 Mar 2025 at 16:51, Richard Earnshaw (lists)
> <richard.earns...@arm.com> wrote:
>>
>> On 21/03/2025 15:15, Christophe Lyon wrote:
>>> On Fri, 21 Mar 2025 at 15:25, Richard Earnshaw (lists)
>>> <richard.earns...@arm.com> wrote:
>>>>
>>>> On 21/03/2025 14:05, Christophe Lyon wrote:
>>>>> On Fri, 21 Mar 2025 at 11:18, Richard Earnshaw (lists)
>>>>> <richard.earns...@arm.com> wrote:
>>>>>>
>>>>>> On 20/03/2025 16:15, Christophe Lyon wrote:
>>>>>>> Depending on if/how the testing flags are overridden, the first value
>>>>>>> we try("") might not do what we want.
>>>>>>>
>>>>>>> For instance, if the whole testsuite is executed with
>>>>>>> (A) -mthumb -march=armv7-m -mtune=cortex-m3 -mfloat-abi=softfp
>>>>>>>
>>>>>>> bf16_neon_ok is first compiled with
>>>>>>> (A) (B)
>>>>>>> where B = -mcpu=unset -march=armv8.2-a+bf16
>>>>>>>
>>>>>>> which is accepted, so a testcase like vld2q_lane_bf16_indices_1.c
>>>>>>> is compiled with:
>>>>>>> (A) (C) (B)
>>>>>>> where C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a 
>>>>>>> -mfpu=neon-fp16 -mfp16-format=ieee
>>>>>>>
>>>>>>> because advsimd-intrinsics.exp has set additional_flags to (C)
>>>>>>> via arm_neon_fp16_ok
>>>>>>>
>>>>>>> So the testcase is compiled with
>>>>>>> [...] -mfpu=neon-fp16 -mcpu=unset -march=armv8.2-a+bf16
>>>>>>> (thus -mfpu=neon-fp16) and bf16 support is disabled.
>>>>>>>
>>>>>>> The patch replaces "" with -mfpu=auto which matches the intended
>>>>>>> effect of -march=armv8.2-a+bf16 as added by bf16_neon_ok, and the
>>>>>>> testcase is now compiled with
>>>>>>> (A) (C) -mfpu=auto (B)
>>>>>>>
>>>>>>> However, since this effective-target is also used on aarch64 (which
>>>>>>> does not support -mfpu=auto), we do this only on arm.
>>>>>>>
>>>>>>> This patch improves coverage, and makes
>>>>>>> v{ld,st}[234]q_lane_bf16_indices_1.c pass when testsuite flags are
>>>>>>> overridden as described above (e.g. for M-profile).
>>>>>>>
>>>>>>>       gcc/testsuite/
>>>>>>>       * lib/target-supports.exp
>>>>>>>       (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache):
>>>>>>>       Conditionally use -mfpu=auto.
>>>>>>> ---
>>>>>>>  gcc/testsuite/lib/target-supports.exp | 9 ++++++++-
>>>>>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>>>>>>> b/gcc/testsuite/lib/target-supports.exp
>>>>>>> index e2622a445c5..09b16a14024 100644
>>>>>>> --- a/gcc/testsuite/lib/target-supports.exp
>>>>>>> +++ b/gcc/testsuite/lib/target-supports.exp
>>>>>>> @@ -6871,12 +6871,19 @@ proc add_options_for_arm_fp16fml_neon { flags } 
>>>>>>> {
>>>>>>>  proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } {
>>>>>>>      global et_arm_v8_2a_bf16_neon_flags
>>>>>>>      set et_arm_v8_2a_bf16_neon_flags ""
>>>>>>> +    set fpu_auto ""
>>>>>>>
>>>>>>>      if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
>>>>>>>       return 0;
>>>>>>>      }
>>>>>>>
>>>>>>> -    foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" 
>>>>>>> "-mfloat-abi=hard -mfpu=neon-fp-armv8" } {
>>>>>>> +    if { [istarget arm*-*-*] } {
>>>>>>> +     set fpu_auto "-mfpu=auto"
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    foreach flags [list "$fpu_auto" \
>>>>>>
>>>>>> Shouldn't we try first with "", even on Arm?  Thus
>>>>>>        foreach flags [list "" "$fpu_auto" \
>>>>>> ...
>>>>>>
>>>>> I don't think so, that's why I tried to explain above.
>>>>> "" is acceptable / accepted in arm_v8_2a_bf16_neon_ok
>>>>> (this is (A) (B) above, where the important parts are:
>>>>> -march=armv7-m -mcpu=unset -march=armv8.2-a+bf16
>>>>> (so -mfpu is set to the toolchain's default)
>>>>
>>>> That's never going to work reliably.  We need to check, somewhere, the 
>>>> full set of options we intend to pass to the compilation.  We can't assume 
>>>> that separately testing if A is ok and B is ok => A + B is ok.
>>>>
>>>
>>> Hmmm I think I raised that problem years ago, because of the way the
>>> test system is designed...
>>>
>>>>>
>>>>> but then the actual testcase is compiled with additional flags (C)
>>>>> defined by the test driver using arm_neon_fp16_ok
>>>>> C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a
>>>>> -mfpu=neon-fp16 -mfp16-format=ieee
>>>>>
>>>>> so the relevant parts of (A) (C) (B) are:
>>>>> -march=armv7-m  -mfpu=neon -mcpu=unset -march=armv7-a -mfpu=neon-fp16
>>>>> -mcpu=unset -march=armv8.2-a+bf16
>>>>> which can be simplified into
>>>>> -mfpu=neon-fp16 -march=armv8.2-a+bf16
>>>>>
>>>>> I think we need to start with -mfpu=auto instead of "", so that when
>>>>> -march=armv8.2-a+bf16 takes effect, we have cancelled any other -mfpu.
>>>>>
>>>>
>>>> Ideally a test shouldn't add any options in some test runs; that way we 
>>>> add some 'base configuration' coverage.  We obviously can't do that 
>>>> everywhere and there may still be cases where the system can't test 
>>>> anything useful at all (hopefully only when linking, or more is needed).
>>>>
>>>
>>> Not sure to follow? If a test wants to check that a given feature
>>> works as expected, it has to use the appropriate options, doesn't it?
>>
>> Not if the base configuration for the run (from the compiler config or from 
>> the RUNTEST flags) already supports that feature.
>>
> 
> OK, but isn't this error-prone?


This whole overriding options thing is error prone, unfortunately.  I've 
sometimes wondered if we should have some directories in the testsuite where we 
just suppress all options passed by the user to the framework (so we only need 
to consider the options configured into the compiler itself).  But even then we 
have to think about header file compatibility, so I'm not sure even that would 
make things much better.  Also, such tests would have to be limited to compile 
only (we can't assume the right libs will exist for linking, much less running 
the code).


> The effective-target unconditionally adds -mcpu=unset -march=XXX
> already, so relying on the implicit default value of -mcpu seems
> inconsistent?

So perhaps we just have the wrong effective target?  

> The would also lead to confusion when comparing two logs with the
> exact same options, but different result because the toolchain was
> built with a different --with-fpu= option (which IIRC does not appear
> in gcc.log).
> 
> Not sure what accepting "" rather than -mfpu=auto gives us?  Given we
> are checking which flags (-mfloat-abi for instance) are need to
> support -march=XXX, don't we want to make sure the corresponding fpu
> is in use?

See below.

> 
>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> +                    "-mfloat-abi=softfp -mfpu=neon-fp-armv8" \
>>>>>>> +                    "-mfloat-abi=hard -mfpu=neon-fp-armv8" ] {
>>>>>>>       if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok 
>>>>>>> object {
>>>>>>>           #include <arm_neon.h>
>>>>>>>           #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
>>>>>>
>>>>>> In fact, since aarch64 doesn't support -mfpu at all, only "" is ever 
>>>>>> going to work here, so perhaps we can recast all this code, perhaps 
>>>>>> along the lines of that in check_effective_target_arm_neon_h_ok_nocache 
>>>>>> so that we don't need to actually try to include arm_neon.h at all while 
>>>>>> figuring out the options.
>>>>>>
>>>>>
>>>>> Maybe.
>>>>>
>>>>> OTOH, actually including arm_neon.h is guaranteed to give the right
>>>>> answer to the question "which flags do we need to be able to include
>>>>> arm_neon.h?", rather than using several conditions that are supposed
>>>>> to match what arm_neon.h does?
>>>>>
>>>>
>>>> Working out whether arm_neon.h can be included can be done with 
>>>> arm_neon_h_ok.  That's currently Arm only, but it's trivial to extend it 
>>>> to aarch64.  Once we have that, we can write some rules that build on that 
>>>> base to get other features that we might desire.  But we must, at some 
>>>> point check, rather than assume, that combining options from different 
>>>> rules will result in something sane.
>>>>
>>>
>>> I suspect most tests which currently use arm*neon* effective targets
>>> actually include arm_neon.h, they should arm_neon_h_ok.
>>>
>>> But I'm not sure what we can do to make sure, rather than assume that
>>> A + C + B does what we want, given that:
>>> - A is defined by the user who runs the testsuite (via RUNTESTFLAGS /
>>> target_board)
>>> - C is defined by the test harness (advsimd-intrinsics.exp)
>>> - B is defined as needed by individual tests
>>>
>>> At least we have no control on A.
>>
>> We have no control of A, but any require_... tests will add A to the option 
>> list they test; so we can override them if needed and check the override 
>> works.
> 
> Yes, that's already what currently happens.
> 
>> C is more complex - we perhaps need some bespoke checks in 
>> advsimd-intrinsics.exp that can be used to ensure that it's base flags are 
>> added to any tests we run.  I need to think about that one a bit; perhaps we 
>> can arrange to add these flags to A before the individual tests run their 
>> additional checks.
>>
> what is the advantage compared to replacing "" with -mfpu=auto, which
> seems simpler?

Consider

 -mcpu=cortex-a53 -march=armv8-a+crc -mfpu=neon-fp-armv8 -mfloat-abi=softfp

which works just fine.

If you add -mfpu=auto (overriding the -mfpu=noen-fp-armv8) to that you get

cc1: warning: switch ‘-mcpu=cortex-a53’ conflicts with switch 
‘-march=armv8-a+crc’

because the underlying arch flags don't match any more.

It's even worse for -mfloat-abi=hard because that will error:

cc1: error: ‘-mfloat-abi=hard’: selected architecture lacks an FPU


> 
>>>
>>> We can create a new directory, where the .exp test harness would not
>>> use additional options, and move tests which need
>>> effective-target-arm-arch* there.
>>> (that is 189 files out of 541 under advsimd-intrinsics/)
>>>
>>
>> Those tests might benefit from a refactor anyway - the common arm/aarch64 
>> tests should probably live in dg.target/aarch, and only the aarch64-specific 
>> tests should be in dg.target/aarch64.
>>
> Indeed, but isn't this an orthogonal problem?
> 
> Thanks,
> 
> Christophe
> 
>> R.
>>
>>> Thanks,
>>>
>>> Christophe
>>>
>>>
>>>> R.
>>>>
>>>>> Thanks,
>>>>>
>>>>> Christophe
>>>>>
>>>>>> R.
>>>>
>>

R.

Re: [PATCH 06/10] testsuite: aarch64: arm: Add -mfpu=auto to arm_v8_2a_bf16_neon_ok

Reply via email to