On 26/05/2020 18:04, Christophe Lyon via Gcc wrote: > On Tue, 19 May 2020 at 13:28, Richard Earnshaw > <richard.earns...@foss.arm.com> wrote: >> >> On 11/05/2020 17:43, Christophe Lyon via Gcc wrote: >>> Hi, >>> >>> >>> As you may know, I've been running validations of GCC trunk in many >>> configurations for Arm and Aarch64. >>> >>> >>> I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, >>> and >>> ACLE tests because in several configurations I see 300-400 FAILs >>> mainly in these areas, because of “testisms”. The goal is to avoid >>> wasting time over the same failure reports when checking what needs >>> fixing. I thought this would be quick & easy, but this is tedious >>> because of the numerous combinations of options and configurations >>> available on Arm. >>> >>> >>> Sorry for the very long email, it’s hard to describe and summarize, >>> but I'd like to try nonetheless, hoping that we can make testing >>> easier/more efficient :-), because most of the time the problems I >>> found are with the tests rather than real compiler bugs, so I think >>> it's a bit of wasted time. >>> >>> >>> Here is a list of problems, starting with the tricky dependencies >>> around -mfloat-abi=XXX: >>> >>> * Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with >>> glibc), or one can decide not to build with both hard and soft FP >>> multilibs. This generally becomes a problem when including stdint.h >>> (used by arm_neon.h, arm_acle.h, …), leading to a compiler error for >>> lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to >>> the picture, it becomes quite complex (eg -mfloat-abi=hard is not >>> supported on thumb-1). >>> >>> >>> Consider mytest.c that does not depend on any include file and has: >>> /* { dg-options "-mfloat-abi=hard" } */ >>> >>> If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 >>> --with-fpu=neon, >>> with ‘make check’, the test PASSes. >>> With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the >>> test FAILs: >>> sorry, unimplemented: Thumb-1 hard-float VFP ABI >>> >>> >>> If I add >>> /* { dg-require-effective-target arm_hard_ok } */ >>> ‘make check’ with --target-board=-march=armv5t/-mthumb is now >>> UNSUPPORTED (which is OK), but >>> plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects >>> that we lack the -mfloat-abi=hard multilib. So we lose a PASS. >>> >>> If I configure GCC for arm-linux-gnueabihf, then: >>> ‘make check’ PASSes >>> ‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs >>> and with >>> /* { dg-require-effective-target arm_hard_ok } */ >>> ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED >>> and >>> plain ‘make check’ PASSes >>> >>> So it seems the best option is to add >>> /* { dg-require-effective-target arm_hard_ok } */ >>> although it makes the test UNSUPPORTED by arm-linux-gnueabi even in >>> cases where it could PASS. >>> >>> Is there consensus that this is the right way? >>> >>> >>> >>> * In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and >>> -march=XXX are independent in general, meaning if you query for >>> -mfloat-abi=hard support, it will do that in the absence of any >>> -march=XXX that the testcase may also be using. So, if GCC is >>> configured with its default cpu/fpu, -mfloat-abi=hard will be rejected >>> for lack of an fpu on the default cpu, but if GCC is configured with a >>> suitable cpu/fpu pair, -mfloat-abi=hard will be accepted. >>> >>> I faced this problem when I tried to “fix” the order in which we try >>> options in >>> Arm_v8_2a_bf16_neon_ok. (see >>> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html) >>> >>> I faced similar problems while working on a patch of mine about a bug >>> with IRQ handlers which has different behaviour depending on the FP >>> ABI used: I have the feeling that I spend too much time writing the >>> tests to the detriment of the patch itself... >>> >>> I also noticed that Richard Sandiford probably faced similar issues >>> with his recent fix for "no_unique_address", where he finally added >>> arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU + >>> float-abi=hard at the same time. >>> >>> Maybe we could decide on a consistent and simpler way of checking such >>> things? >>> >>> >>> * A metric for this complexity could be the number of arm >>> effective-targets, a quick and not-fully accurate grep | sed | sort | >>> uniq -c | sort -n on target-supports.exp ends with: >>> 9 mips >>> 16 aarch64 >>> 21 powerpc >>> 97 vect >>> 106 arm >>> (does not count all the effective-targets generated by tcl code, eg >>> arm_arch_FUNC_ok) >>> >>> This probably explains why it’s hard to get test directives right :-) >>> >>> I’ve not thought about how we could reduce that number…. >>> >>> >>> >>> * Finally, I’m wondering about the most appropriate way of configuring >>> GCC and running the tests. >>> >>> So far, for most of the configurations I'm testing, I use different >>> --with-cpu/--with-fpu/--with-mode configure flags for each toolchain >>> configuration I’m testing and rarely override the flags at testing >>> time. I also disable multilibs to save build time and (scratch) disk >>> space. (See >>> https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html >>> for the current list, each line corresponds to a clean build + make >>> check job -- so there are 15 different toolchain configs for >>> arm-linux-gnueabihf for instance) >>> >>> However, I think this is may not be appropriate at least for the >>> arm-eabi toolchains, because I suspect the vendors who support several >>> SoCs generally ship one binary toolchain built with the default >>> cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile), >>> and the associated IDE adds the right -mcpu/-mfpu flags (see >>> arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that >>> the "appropriate" way of testing such a toolchain is to build it with >>> the default settings and appropriate multilibs and add the needed >>> -mcpu/-mfpu variants at 'make check' time. >>> >>> I would still build one toolchain per configuration I want to test and >>> not use runtest’s capability to iterate over several combinations: >>> this way I can run the tests in parallel and reduce the total time >>> needed to get the results. >>> >>> One can compare the results of both options with the two lines with >>> cortex-m33 in the above table (target arm-none-eabi). >>> >>> In the first one, GCC is configured for cortex-m33, and tests executed >>> via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space >>> 14GB) >>> >>> In the 2nd line, GCC is configured with the default cpu/fpu, multilibs >>> enabled and I use test flags suitable for cortex-m33: now only 73 >>> failures for gcc. (duration ~3h15, disk space 26GB). Note that there >>> are more failures for g++ and libstdc++ than for the previous line, I >>> haven’t fully checked why -- for libstdc++ there are spurious >>> -march=armv8-m.main+fp flags in the log. So this is not the magic >>> bullet. >>> >>> >>> Unfortunately, this means every test with arm_hard_ok effective target >>> would be unsupported (lack of fpu on default cpu) whatever the >>> validation cflags. The increased build time (many multilibs built for >>> nothing) will also reduce the validation bandwidth (I hope the >>> increased scratch disk space will not be a problem with my IT…) >>> >>> >>> >>> OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors >>> probably prefer to tune them for their preferred default CPU. For >>> instance I have an arm board running Ubuntu with gcc-5.4 configured >>> --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard >>> --with-mode=thumb. >>> >>> If this is right, it would mean I should keep the configurations I >>> currently use for arm-linux* (no multilib, rely on default cpu/fpu). >>> >>> ** Regarding the flags used for testing, I’m also wondering what’s the >>> most appropriate: -mcpu or -march. Both have probably pros and cons? >>> >>> In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I >>> described a problem where it seems that one expects the tests to run >>> with -march=XXX. >>> >>> Another log of mine has an effective-target helper compiled with: >>> -mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp >>> -mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb >>> which produces this error: >>> cc1: warning: switch '-mcpu=cortex-m33' conflicts with >>> '-march=armv8.1-m.main' switch >>> which looks suspicious: running the tests in multiple ways surely >>> helps uncovering bugs…. >>> >>> >>> In summary, I’d like to gather opinions on: >>> * appropriate usage of dg-require-effective-target arm_hard_ok >>> * how to improve float-abi support detection in combination with >>> architecture level >>> * hopefully consensus on choosing how to configure the toolchain and >>> run the tests. I’m suggesting default config + multilibs + >>> runtest-flags for arm-eabi and a selection of default cpu/fpu + less >>> runtest-flags for arm-linux*. >>> >>> >>> Thanks for reading that far :-) >>> >>> >>> Christophe >>> >> > > Thanks för your anwer. > > >> I've been pondering this for some time now (well before you sent your mail). >> >> My feeling is that trying to control this via dejagnu options is just >> getting too fiddly. Perhaps a new approach is called for. >> >> My thoughts are along the line of reworking the tests to use >> >> #pragma target <option> >> >> etc (or the attribute equivalent), to set the compilation state to >> something appropriate for the test so that the output is reasonable for >> that and then we can stabilize the test. >> >> It only works for assembly tests, not for anything that requires linking >> or execution: but for those tests we shouldn't be looking for a specific >> output but a specific behaviour and we can tolerate more variation in >> the instructions that implement that behaviour (hybrid tests would need >> splitting). > > I'm not sure to fully understand what you mean: if we add #pragma CPU XXX > to a test for instance, and then run the tests with -mcpu=YYY, then > the test will still be compiled for XXX, right? > How would we detect that the generated code is wrong if compiling for YYY? >
That's a separate test. You either accept what's on the command line for the multilib, or you have a test that essentially ignores the command-line options (but is a compile-to-asm only test). You can't have it both ways without the mess we have now. >> >> It's a fair amount of work, though, since many of the required options >> cannot be controlled today via the attributes. It's also not entirely > Indeed! > > Not to mention that we would also have to decorate the many existing tests. > >> clear whether these should be exposed to users, since in most cases such >> control is unlikely to be of use in real code. > Probably indeed. > > For the record, I've changed the way I run the validations for > arm-eabi as I described in my original email: > I now use the default cpu/fpu/mode at GCC configure time, enable the > relevant multilibs then override the compilation flags when running > the tests. > > For instance: -mthumb/-mcpu=cortex-m33/-mfloat-abi=hard > > The number of failures is now lower than it used to be when > configuring --with-cpu=cortex-m33. > > Christophe > R.