Re: [arm] GCC validation: preferred way of running the testsuite?

Richard Earnshaw Tue, 19 May 2020 04:29:10 -0700

On 11/05/2020 17:43, Christophe Lyon via Gcc wrote:
> Hi,
> 
> 
> As you may know, I've been running validations of GCC trunk in many
> configurations for Arm and Aarch64.
> 
> 
> I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, and
> ACLE tests because in several configurations I see 300-400 FAILs
> mainly in these areas, because of “testisms”. The goal is to avoid
> wasting time over the same failure reports when checking what needs
> fixing. I thought this would be quick & easy, but this is tedious
> because of the numerous combinations of options and configurations
> available on Arm.
> 
> 
> Sorry for the very long email, it’s hard to describe and summarize,
> but I'd like to try nonetheless, hoping that we can make testing
> easier/more efficient :-), because most of the time the problems I
> found are with the tests rather than real compiler bugs, so I think
> it's a bit of wasted time.
> 
> 
> Here is a list of problems, starting with the tricky dependencies
> around -mfloat-abi=XXX:
> 
> * Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with
> glibc), or one can decide not to build with both hard and soft FP
> multilibs. This generally becomes a problem when including stdint.h
> (used by arm_neon.h, arm_acle.h, …), leading to a compiler error for
> lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to
> the picture, it becomes quite complex (eg -mfloat-abi=hard is not
> supported on thumb-1).
> 
> 
> Consider mytest.c that does not depend on any include file and has:
> /* { dg-options "-mfloat-abi=hard" } */
> 
> If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 
> --with-fpu=neon,
> with ‘make check’, the test PASSes.
> With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the
> test FAILs:
> sorry, unimplemented: Thumb-1 hard-float VFP ABI
> 
> 
> If I add
> /* { dg-require-effective-target arm_hard_ok } */
> ‘make check’ with --target-board=-march=armv5t/-mthumb is now
> UNSUPPORTED (which is OK), but
> plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects
> that we lack the -mfloat-abi=hard multilib. So we lose a PASS.
> 
> If I configure GCC for arm-linux-gnueabihf, then:
> ‘make check’ PASSes
> ‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs
> and with
> /* { dg-require-effective-target arm_hard_ok } */
> ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED and
> plain ‘make check’ PASSes
> 
> So it seems the best option is to add
> /* { dg-require-effective-target arm_hard_ok } */
> although it makes the test UNSUPPORTED by arm-linux-gnueabi even in
> cases where it could PASS.
> 
> Is there consensus that this is the right way?
> 
> 
> 
> * In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and
> -march=XXX are independent in general, meaning if you query for
> -mfloat-abi=hard support, it will do that in the absence of any
> -march=XXX that the testcase may also be using. So, if GCC is
> configured with its default cpu/fpu, -mfloat-abi=hard will be rejected
> for lack of an fpu on the default cpu, but if GCC is configured with a
> suitable cpu/fpu pair, -mfloat-abi=hard will be accepted.
> 
> I faced this problem when I tried to “fix” the order in which we try options 
> in
> Arm_v8_2a_bf16_neon_ok. (see
> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html)
> 
> I faced similar problems while working on a patch of mine about a bug
> with IRQ handlers which has different behaviour depending on the FP
> ABI used: I have the feeling that I spend too much time writing the
> tests to the detriment of the patch itself...
> 
> I also noticed that Richard Sandiford probably faced similar issues
> with his recent fix for "no_unique_address", where he finally added
> arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU +
> float-abi=hard at the same time.
> 
> Maybe we could decide on a consistent and simpler way of checking such things?
> 
> 
> * A metric for this complexity could be the number of arm
> effective-targets, a quick and not-fully accurate grep | sed | sort |
> uniq -c | sort -n on target-supports.exp ends with:
>      9 mips
>      16 aarch64
>      21 powerpc
>      97 vect
>     106 arm
> (does not count all the effective-targets generated by tcl code, eg
> arm_arch_FUNC_ok)
> 
> This probably explains why it’s hard to get test directives right :-)
> 
> I’ve not thought about how we could reduce that number….
> 
> 
> 
> * Finally, I’m wondering about the most appropriate way of configuring
> GCC and running the tests.
> 
> So far, for most of the configurations I'm testing, I use different
> --with-cpu/--with-fpu/--with-mode configure flags for each toolchain
> configuration I’m testing and rarely override the flags at testing
> time. I also disable multilibs to save build time and (scratch) disk
> space. (See 
> https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
> for the current list, each line corresponds to a clean build + make
> check job -- so there are 15 different toolchain configs for
> arm-linux-gnueabihf for instance)
> 
> However, I think this is may not be appropriate at least for the
> arm-eabi toolchains, because I suspect the vendors who support several
> SoCs generally ship one binary toolchain built with the default
> cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile),
> and the associated IDE adds the right -mcpu/-mfpu flags (see
> arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that
> the "appropriate" way of testing such a toolchain is to build it with
> the default settings and appropriate multilibs and add the needed
> -mcpu/-mfpu variants at 'make check' time.
> 
> I would still build one toolchain per configuration I want to test and
> not use runtest’s capability to iterate over several combinations:
> this way I can run the tests in parallel and reduce the total time
> needed to get the results.
> 
> One can compare the results of both options with the two lines with
> cortex-m33 in the above table (target arm-none-eabi).
> 
> In the first one, GCC is configured for cortex-m33, and tests executed
> via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space
> 14GB)
> 
> In the 2nd line, GCC is configured with the default cpu/fpu, multilibs
> enabled and I use test flags suitable for cortex-m33: now only 73
> failures for gcc. (duration ~3h15, disk space 26GB). Note that there
> are more failures for g++ and libstdc++ than for the previous line, I
> haven’t fully checked why -- for libstdc++ there are spurious
> -march=armv8-m.main+fp flags in the log. So this is not the magic
> bullet.
> 
> 
> Unfortunately, this means every test with arm_hard_ok effective target
> would be unsupported (lack of fpu on default cpu) whatever the
> validation cflags. The increased build time (many multilibs built for
> nothing) will also reduce the validation bandwidth (I hope the
> increased scratch disk space will not be a problem with my IT…)
> 
> 
> 
> OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors
> probably prefer to tune them for their preferred default CPU. For
> instance I have an arm board running Ubuntu with gcc-5.4 configured
> --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
> --with-mode=thumb.
> 
> If this is right, it would mean I should keep the configurations I
> currently use for arm-linux* (no multilib, rely on default cpu/fpu).
> 
> ** Regarding the flags used for testing, I’m also wondering what’s the
> most appropriate: -mcpu or -march. Both have probably pros and cons?
> 
> In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I
> described a problem where it seems that one expects the tests to run
> with -march=XXX.
> 
> Another log of mine has an effective-target helper compiled with:
> -mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp
> -mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb
> which produces this error:
> cc1: warning: switch '-mcpu=cortex-m33' conflicts with
> '-march=armv8.1-m.main' switch
> which looks suspicious: running the tests in multiple ways surely
> helps uncovering bugs….
> 
> 
> In summary, I’d like to gather opinions on:
> * appropriate usage of dg-require-effective-target arm_hard_ok
> * how to improve float-abi support detection in combination with
> architecture level
> * hopefully consensus on choosing how to configure the toolchain and
> run the tests. I’m suggesting default config + multilibs +
> runtest-flags for arm-eabi and a selection of default cpu/fpu + less
> runtest-flags for arm-linux*.
> 
> 
> Thanks for reading that far :-)
> 
> 
> Christophe
>


I've been pondering this for some time now (well before you sent your mail).

My feeling is that trying to control this via dejagnu options is just
getting too fiddly.  Perhaps a new approach is called for.

My thoughts are along the line of reworking the tests to use

  #pragma target <option>

etc (or the attribute equivalent), to set the compilation state to
something appropriate for the test so that the output is reasonable for
that and then we can stabilize the test.

It only works for assembly tests, not for anything that requires linking
or execution: but for those tests we shouldn't be looking for a specific
output but a specific behaviour and we can tolerate more variation in
the instructions that implement that behaviour (hybrid tests would need
splitting).

It's a fair amount of work, though, since many of the required options
cannot be controlled today via the attributes.  It's also not entirely
clear whether these should be exposed to users, since in most cases such
control is unlikely to be of use in real code.

R.

Re: [arm] GCC validation: preferred way of running the testsuite?

Reply via email to