On 21/03/2025 14:05, Christophe Lyon wrote: > On Fri, 21 Mar 2025 at 11:18, Richard Earnshaw (lists) > <richard.earns...@arm.com> wrote: >> >> On 20/03/2025 16:15, Christophe Lyon wrote: >>> Depending on if/how the testing flags are overridden, the first value >>> we try("") might not do what we want. >>> >>> For instance, if the whole testsuite is executed with >>> (A) -mthumb -march=armv7-m -mtune=cortex-m3 -mfloat-abi=softfp >>> >>> bf16_neon_ok is first compiled with >>> (A) (B) >>> where B = -mcpu=unset -march=armv8.2-a+bf16 >>> >>> which is accepted, so a testcase like vld2q_lane_bf16_indices_1.c >>> is compiled with: >>> (A) (C) (B) >>> where C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a >>> -mfpu=neon-fp16 -mfp16-format=ieee >>> >>> because advsimd-intrinsics.exp has set additional_flags to (C) >>> via arm_neon_fp16_ok >>> >>> So the testcase is compiled with >>> [...] -mfpu=neon-fp16 -mcpu=unset -march=armv8.2-a+bf16 >>> (thus -mfpu=neon-fp16) and bf16 support is disabled. >>> >>> The patch replaces "" with -mfpu=auto which matches the intended >>> effect of -march=armv8.2-a+bf16 as added by bf16_neon_ok, and the >>> testcase is now compiled with >>> (A) (C) -mfpu=auto (B) >>> >>> However, since this effective-target is also used on aarch64 (which >>> does not support -mfpu=auto), we do this only on arm. >>> >>> This patch improves coverage, and makes >>> v{ld,st}[234]q_lane_bf16_indices_1.c pass when testsuite flags are >>> overridden as described above (e.g. for M-profile). >>> >>> gcc/testsuite/ >>> * lib/target-supports.exp >>> (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): >>> Conditionally use -mfpu=auto. >>> --- >>> gcc/testsuite/lib/target-supports.exp | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/gcc/testsuite/lib/target-supports.exp >>> b/gcc/testsuite/lib/target-supports.exp >>> index e2622a445c5..09b16a14024 100644 >>> --- a/gcc/testsuite/lib/target-supports.exp >>> +++ b/gcc/testsuite/lib/target-supports.exp >>> @@ -6871,12 +6871,19 @@ proc add_options_for_arm_fp16fml_neon { flags } { >>> proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } { >>> global et_arm_v8_2a_bf16_neon_flags >>> set et_arm_v8_2a_bf16_neon_flags "" >>> + set fpu_auto "" >>> >>> if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { >>> return 0; >>> } >>> >>> - foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" >>> "-mfloat-abi=hard -mfpu=neon-fp-armv8" } { >>> + if { [istarget arm*-*-*] } { >>> + set fpu_auto "-mfpu=auto" >>> + } >>> + >>> + foreach flags [list "$fpu_auto" \ >> >> Shouldn't we try first with "", even on Arm? Thus >> foreach flags [list "" "$fpu_auto" \ >> ... >> > I don't think so, that's why I tried to explain above. > "" is acceptable / accepted in arm_v8_2a_bf16_neon_ok > (this is (A) (B) above, where the important parts are: > -march=armv7-m -mcpu=unset -march=armv8.2-a+bf16 > (so -mfpu is set to the toolchain's default)
That's never going to work reliably. We need to check, somewhere, the full set of options we intend to pass to the compilation. We can't assume that separately testing if A is ok and B is ok => A + B is ok. > > but then the actual testcase is compiled with additional flags (C) > defined by the test driver using arm_neon_fp16_ok > C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a > -mfpu=neon-fp16 -mfp16-format=ieee > > so the relevant parts of (A) (C) (B) are: > -march=armv7-m -mfpu=neon -mcpu=unset -march=armv7-a -mfpu=neon-fp16 > -mcpu=unset -march=armv8.2-a+bf16 > which can be simplified into > -mfpu=neon-fp16 -march=armv8.2-a+bf16 > > I think we need to start with -mfpu=auto instead of "", so that when > -march=armv8.2-a+bf16 takes effect, we have cancelled any other -mfpu. > Ideally a test shouldn't add any options in some test runs; that way we add some 'base configuration' coverage. We obviously can't do that everywhere and there may still be cases where the system can't test anything useful at all (hopefully only when linking, or more is needed). > > >> >>> + "-mfloat-abi=softfp -mfpu=neon-fp-armv8" \ >>> + "-mfloat-abi=hard -mfpu=neon-fp-armv8" ] { >>> if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok >>> object { >>> #include <arm_neon.h> >>> #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC) >> >> In fact, since aarch64 doesn't support -mfpu at all, only "" is ever going >> to work here, so perhaps we can recast all this code, perhaps along the >> lines of that in check_effective_target_arm_neon_h_ok_nocache so that we >> don't need to actually try to include arm_neon.h at all while figuring out >> the options. >> > > Maybe. > > OTOH, actually including arm_neon.h is guaranteed to give the right > answer to the question "which flags do we need to be able to include > arm_neon.h?", rather than using several conditions that are supposed > to match what arm_neon.h does? > Working out whether arm_neon.h can be included can be done with arm_neon_h_ok. That's currently Arm only, but it's trivial to extend it to aarch64. Once we have that, we can write some rules that build on that base to get other features that we might desire. But we must, at some point check, rather than assume, that combining options from different rules will result in something sane. R. > Thanks, > > Christophe > >> R.