On Mon, 24 Mar 2025 at 16:13, Richard Earnshaw (lists) <richard.earns...@arm.com> wrote: > > On 24/03/2025 14:52, Christophe Lyon wrote: > > On Mon, 24 Mar 2025 at 15:13, Richard Earnshaw (lists) > > <richard.earns...@arm.com> wrote: > >> > >> On 21/03/2025 17:30, Christophe Lyon wrote: > >>> On Fri, 21 Mar 2025 at 16:51, Richard Earnshaw (lists) > >>> <richard.earns...@arm.com> wrote: > >>>> > >>>> On 21/03/2025 15:15, Christophe Lyon wrote: > >>>>> On Fri, 21 Mar 2025 at 15:25, Richard Earnshaw (lists) > >>>>> <richard.earns...@arm.com> wrote: > >>>>>> > >>>>>> On 21/03/2025 14:05, Christophe Lyon wrote: > >>>>>>> On Fri, 21 Mar 2025 at 11:18, Richard Earnshaw (lists) > >>>>>>> <richard.earns...@arm.com> wrote: > >>>>>>>> > >>>>>>>> On 20/03/2025 16:15, Christophe Lyon wrote: > >>>>>>>>> Depending on if/how the testing flags are overridden, the first > >>>>>>>>> value > >>>>>>>>> we try("") might not do what we want. > >>>>>>>>> > >>>>>>>>> For instance, if the whole testsuite is executed with > >>>>>>>>> (A) -mthumb -march=armv7-m -mtune=cortex-m3 -mfloat-abi=softfp > >>>>>>>>> > >>>>>>>>> bf16_neon_ok is first compiled with > >>>>>>>>> (A) (B) > >>>>>>>>> where B = -mcpu=unset -march=armv8.2-a+bf16 > >>>>>>>>> > >>>>>>>>> which is accepted, so a testcase like vld2q_lane_bf16_indices_1.c > >>>>>>>>> is compiled with: > >>>>>>>>> (A) (C) (B) > >>>>>>>>> where C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a > >>>>>>>>> -mfpu=neon-fp16 -mfp16-format=ieee > >>>>>>>>> > >>>>>>>>> because advsimd-intrinsics.exp has set additional_flags to (C) > >>>>>>>>> via arm_neon_fp16_ok > >>>>>>>>> > >>>>>>>>> So the testcase is compiled with > >>>>>>>>> [...] -mfpu=neon-fp16 -mcpu=unset -march=armv8.2-a+bf16 > >>>>>>>>> (thus -mfpu=neon-fp16) and bf16 support is disabled. > >>>>>>>>> > >>>>>>>>> The patch replaces "" with -mfpu=auto which matches the intended > >>>>>>>>> effect of -march=armv8.2-a+bf16 as added by bf16_neon_ok, and the > >>>>>>>>> testcase is now compiled with > >>>>>>>>> (A) (C) -mfpu=auto (B) > >>>>>>>>> > >>>>>>>>> However, since this effective-target is also used on aarch64 (which > >>>>>>>>> does not support -mfpu=auto), we do this only on arm. > >>>>>>>>> > >>>>>>>>> This patch improves coverage, and makes > >>>>>>>>> v{ld,st}[234]q_lane_bf16_indices_1.c pass when testsuite flags are > >>>>>>>>> overridden as described above (e.g. for M-profile). > >>>>>>>>> > >>>>>>>>> gcc/testsuite/ > >>>>>>>>> * lib/target-supports.exp > >>>>>>>>> (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): > >>>>>>>>> Conditionally use -mfpu=auto. > >>>>>>>>> --- > >>>>>>>>> gcc/testsuite/lib/target-supports.exp | 9 ++++++++- > >>>>>>>>> 1 file changed, 8 insertions(+), 1 deletion(-) > >>>>>>>>> > >>>>>>>>> diff --git a/gcc/testsuite/lib/target-supports.exp > >>>>>>>>> b/gcc/testsuite/lib/target-supports.exp > >>>>>>>>> index e2622a445c5..09b16a14024 100644 > >>>>>>>>> --- a/gcc/testsuite/lib/target-supports.exp > >>>>>>>>> +++ b/gcc/testsuite/lib/target-supports.exp > >>>>>>>>> @@ -6871,12 +6871,19 @@ proc add_options_for_arm_fp16fml_neon { > >>>>>>>>> flags } { > >>>>>>>>> proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } { > >>>>>>>>> global et_arm_v8_2a_bf16_neon_flags > >>>>>>>>> set et_arm_v8_2a_bf16_neon_flags "" > >>>>>>>>> + set fpu_auto "" > >>>>>>>>> > >>>>>>>>> if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } { > >>>>>>>>> return 0; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> - foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" > >>>>>>>>> "-mfloat-abi=hard -mfpu=neon-fp-armv8" } { > >>>>>>>>> + if { [istarget arm*-*-*] } { > >>>>>>>>> + set fpu_auto "-mfpu=auto" > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> + foreach flags [list "$fpu_auto" \ > >>>>>>>> > >>>>>>>> Shouldn't we try first with "", even on Arm? Thus > >>>>>>>> foreach flags [list "" "$fpu_auto" \ > >>>>>>>> ... > >>>>>>>> > >>>>>>> I don't think so, that's why I tried to explain above. > >>>>>>> "" is acceptable / accepted in arm_v8_2a_bf16_neon_ok > >>>>>>> (this is (A) (B) above, where the important parts are: > >>>>>>> -march=armv7-m -mcpu=unset -march=armv8.2-a+bf16 > >>>>>>> (so -mfpu is set to the toolchain's default) > >>>>>> > >>>>>> That's never going to work reliably. We need to check, somewhere, the > >>>>>> full set of options we intend to pass to the compilation. We can't > >>>>>> assume that separately testing if A is ok and B is ok => A + B is ok. > >>>>>> > >>>>> > >>>>> Hmmm I think I raised that problem years ago, because of the way the > >>>>> test system is designed... > >>>>> > >>>>>>> > >>>>>>> but then the actual testcase is compiled with additional flags (C) > >>>>>>> defined by the test driver using arm_neon_fp16_ok > >>>>>>> C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a > >>>>>>> -mfpu=neon-fp16 -mfp16-format=ieee > >>>>>>> > >>>>>>> so the relevant parts of (A) (C) (B) are: > >>>>>>> -march=armv7-m -mfpu=neon -mcpu=unset -march=armv7-a -mfpu=neon-fp16 > >>>>>>> -mcpu=unset -march=armv8.2-a+bf16 > >>>>>>> which can be simplified into > >>>>>>> -mfpu=neon-fp16 -march=armv8.2-a+bf16 > >>>>>>> > >>>>>>> I think we need to start with -mfpu=auto instead of "", so that when > >>>>>>> -march=armv8.2-a+bf16 takes effect, we have cancelled any other -mfpu. > >>>>>>> > >>>>>> > >>>>>> Ideally a test shouldn't add any options in some test runs; that way > >>>>>> we add some 'base configuration' coverage. We obviously can't do that > >>>>>> everywhere and there may still be cases where the system can't test > >>>>>> anything useful at all (hopefully only when linking, or more is > >>>>>> needed). > >>>>>> > >>>>> > >>>>> Not sure to follow? If a test wants to check that a given feature > >>>>> works as expected, it has to use the appropriate options, doesn't it? > >>>> > >>>> Not if the base configuration for the run (from the compiler config or > >>>> from the RUNTEST flags) already supports that feature. > >>>> > >>> > >>> OK, but isn't this error-prone? > >> > >> This whole overriding options thing is error prone, unfortunately. I've > >> sometimes wondered if we should have some directories in the testsuite > >> where we just suppress all options passed by the user to the framework (so > >> we only need to consider the options configured into the compiler itself). > >> But even then we have to think about header file compatibility, so I'm > >> not sure even that would make things much better. Also, such tests would > >> have to be limited to compile only (we can't assume the right libs will > >> exist for linking, much less running the code). > >> > >> > >>> The effective-target unconditionally adds -mcpu=unset -march=XXX > >>> already, so relying on the implicit default value of -mcpu seems > >>> inconsistent? > >> > >> So perhaps we just have the wrong effective target? > >> > > If we consider the one this patch modifies: arm_v8_2a_bf16_neon_ok > > AFAIU, it looks for the right set of flags needed to have > > __ARM_FEATURE_BF16_VECTOR_ARITHMETIC defined, > > so it tries -mfloat-abi / -mfpu combinations in addition to > > -mcpu=unset -march=armv8.2-a+bf16 > > It seems consistent with the other numerous effective targets we have? > > > > Right, but testing to see if __ARM_FEATURE_BF16_VECTOR_ARITHMETIC is defined > already (with no overrides) should be done first. Otherwise we might as well > abandon pretty much all of the options and override everything straight away. >
Agreed, I'm not sure if testing with "" is useful at all. AFAIU, all the tests which use require-effectve-target ensure that a given feature is available, and possibly add some flags to ensure that. They don't care how this is achieved (whether with "" or with "-mfloat-abi=XXX -mfpu=YYY"), they just rely on the feature being enabled. We still have to detect whether -mfloat-abi=XXX is needed in case the toolchain does not support all of soft/softfp/hard though. > > > >>> The would also lead to confusion when comparing two logs with the > >>> exact same options, but different result because the toolchain was > >>> built with a different --with-fpu= option (which IIRC does not appear > >>> in gcc.log). > >>> > >>> Not sure what accepting "" rather than -mfpu=auto gives us? Given we > >>> are checking which flags (-mfloat-abi for instance) are need to > >>> support -march=XXX, don't we want to make sure the corresponding fpu > >>> is in use? > >> > >> See below. > >> > >>> > >>>>> > >>>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>>> + "-mfloat-abi=softfp -mfpu=neon-fp-armv8" \ > >>>>>>>>> + "-mfloat-abi=hard -mfpu=neon-fp-armv8" ] { > >>>>>>>>> if { [check_no_compiler_messages_nocache > >>>>>>>>> arm_v8_2a_bf16_neon_ok object { > >>>>>>>>> #include <arm_neon.h> > >>>>>>>>> #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC) > >>>>>>>> > >>>>>>>> In fact, since aarch64 doesn't support -mfpu at all, only "" is ever > >>>>>>>> going to work here, so perhaps we can recast all this code, perhaps > >>>>>>>> along the lines of that in > >>>>>>>> check_effective_target_arm_neon_h_ok_nocache so that we don't need > >>>>>>>> to actually try to include arm_neon.h at all while figuring out the > >>>>>>>> options. > >>>>>>>> > >>>>>>> > >>>>>>> Maybe. > >>>>>>> > >>>>>>> OTOH, actually including arm_neon.h is guaranteed to give the right > >>>>>>> answer to the question "which flags do we need to be able to include > >>>>>>> arm_neon.h?", rather than using several conditions that are supposed > >>>>>>> to match what arm_neon.h does? > >>>>>>> > >>>>>> > >>>>>> Working out whether arm_neon.h can be included can be done with > >>>>>> arm_neon_h_ok. That's currently Arm only, but it's trivial to extend > >>>>>> it to aarch64. Once we have that, we can write some rules that build > >>>>>> on that base to get other features that we might desire. But we must, > >>>>>> at some point check, rather than assume, that combining options from > >>>>>> different rules will result in something sane. > >>>>>> > >>>>> > >>>>> I suspect most tests which currently use arm*neon* effective targets > >>>>> actually include arm_neon.h, they should arm_neon_h_ok. > >>>>> > >>>>> But I'm not sure what we can do to make sure, rather than assume that > >>>>> A + C + B does what we want, given that: > >>>>> - A is defined by the user who runs the testsuite (via RUNTESTFLAGS / > >>>>> target_board) > >>>>> - C is defined by the test harness (advsimd-intrinsics.exp) > >>>>> - B is defined as needed by individual tests > >>>>> > >>>>> At least we have no control on A. > >>>> > >>>> We have no control of A, but any require_... tests will add A to the > >>>> option list they test; so we can override them if needed and check the > >>>> override works. > >>> > >>> Yes, that's already what currently happens. > >>> > >>>> C is more complex - we perhaps need some bespoke checks in > >>>> advsimd-intrinsics.exp that can be used to ensure that it's base flags > >>>> are added to any tests we run. I need to think about that one a bit; > >>>> perhaps we can arrange to add these flags to A before the individual > >>>> tests run their additional checks. > >>>> > >>> what is the advantage compared to replacing "" with -mfpu=auto, which > >>> seems simpler? > >> > >> Consider > >> > >> -mcpu=cortex-a53 -march=armv8-a+crc -mfpu=neon-fp-armv8 -mfloat-abi=softfp > >> > > where would that list come from? Is that user-supplied overrides, or a > > concatenation of effective targets? > > Yes, user-supplied. It's based on (though not identical to) an option set > Torbjorn tests for his m-profile devices. I guess it could also come from > the compiler-configured defaults as well, though: --with-cpu=cortex-a53 > --with-arch=armv8-a+crc --with-fpu=neon-fp-armv8 --with-float=softfp would > have the same effect. > > > > >> which works just fine. > >> > >> If you add -mfpu=auto (overriding the -mfpu=noen-fp-armv8) to that you get > >> > >> cc1: warning: switch ‘-mcpu=cortex-a53’ conflicts with switch > >> ‘-march=armv8-a+crc’ > >> > >> because the underlying arch flags don't match any more. > > > > If -mtpu=auto would be added by an effective target (like in this > > patch for instance), > > it would also add -mcpu=unset before the relevant -march, so that > > particular sequence shouldn't be generated? > > Adding -mfpu=auto when we also add/override the CPU/arch settings is one > thing (since we're saying that those options will be providing the right > information about the FPU). It's also the way ETs like arm_v8a_hard work. > But the more I think about it, the more I'm sure that just using it on its > own is not the right way forward. > In this case it's not on its own (or I misunderstand what you mean): we always add -mcpu=unset -march=armv8.2-a+bf16, so we'd effectively try -mfpu=auto -mcpu=unset -march=armv8.2-a+bf16 first. What do you have in mind as a way forward? > > > > -march=armv8-a+crc is added by arm_crc_ok, but it also adds > > -mcpu=unset, so that would cancel -mcpu=cortex-a53 > > -mfpu=neon-fp-armv8 can be added by several effective targets, all of > > which add -mcpu=unset already > > and if -mfpu=auto alone would cause an error, they would iterate and > > could select -mfpu=neon-fp-armv8 is it works (arm_v8_neon_ok for > > instance) > > > > Apart from a small set of tests for the legacy -mfpu options, I'd really like > to eliminate all uses of -mfpu= (other than auto) from the entire testsuite. > But that's a different piece of work and I don't want to conflate that change > with what you're doing here if we can avoid it (though eliminating -mfpu= in > this way for this specific rule would be ok if that were the right solution). > Do you mean for instance in arm_v8_2a_bf16_neon_ok_nocache, replacing the existing -mfpu=neon-fp-armv8 with -mfpu=auto? > > > >> > >> It's even worse for -mfloat-abi=hard because that will error: > >> > >> cc1: error: ‘-mfloat-abi=hard’: selected architecture lacks an FPU > >> > >> > >>> > >>>>> > >>>>> We can create a new directory, where the .exp test harness would not > >>>>> use additional options, and move tests which need > >>>>> effective-target-arm-arch* there. > >>>>> (that is 189 files out of 541 under advsimd-intrinsics/) > >>>>> > >>>> > >>>> Those tests might benefit from a refactor anyway - the common > >>>> arm/aarch64 tests should probably live in dg.target/aarch, and only the > >>>> aarch64-specific tests should be in dg.target/aarch64. > >>>> > >>> Indeed, but isn't this an orthogonal problem? > >>> > >>> Thanks, > >>> > >>> Christophe > >>> > >>>> R. > >>>> > >>>>> Thanks, > >>>>> > >>>>> Christophe > >>>>> > >>>>> > >>>>>> R. > >>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Christophe > >>>>>>> > >>>>>>>> R. > >>>>>> > >>>> > >> > >> R. > > R.