Re: [arm] GCC validation: preferred way of running the testsuite?

Richard Earnshaw Tue, 26 May 2020 10:09:10 -0700

On 26/05/2020 18:04, Christophe Lyon via Gcc wrote:
> On Tue, 19 May 2020 at 13:28, Richard Earnshaw
> <richard.earns...@foss.arm.com> wrote:
>>
>> On 11/05/2020 17:43, Christophe Lyon via Gcc wrote:
>>> Hi,
>>>
>>>
>>> As you may know, I've been running validations of GCC trunk in many
>>> configurations for Arm and Aarch64.
>>>
>>>
>>> I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, 
>>> and
>>> ACLE tests because in several configurations I see 300-400 FAILs
>>> mainly in these areas, because of “testisms”. The goal is to avoid
>>> wasting time over the same failure reports when checking what needs
>>> fixing. I thought this would be quick & easy, but this is tedious
>>> because of the numerous combinations of options and configurations
>>> available on Arm.
>>>
>>>
>>> Sorry for the very long email, it’s hard to describe and summarize,
>>> but I'd like to try nonetheless, hoping that we can make testing
>>> easier/more efficient :-), because most of the time the problems I
>>> found are with the tests rather than real compiler bugs, so I think
>>> it's a bit of wasted time.
>>>
>>>
>>> Here is a list of problems, starting with the tricky dependencies
>>> around -mfloat-abi=XXX:
>>>
>>> * Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with
>>> glibc), or one can decide not to build with both hard and soft FP
>>> multilibs. This generally becomes a problem when including stdint.h
>>> (used by arm_neon.h, arm_acle.h, …), leading to a compiler error for
>>> lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to
>>> the picture, it becomes quite complex (eg -mfloat-abi=hard is not
>>> supported on thumb-1).
>>>
>>>
>>> Consider mytest.c that does not depend on any include file and has:
>>> /* { dg-options "-mfloat-abi=hard" } */
>>>
>>> If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 
>>> --with-fpu=neon,
>>> with ‘make check’, the test PASSes.
>>> With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the
>>> test FAILs:
>>> sorry, unimplemented: Thumb-1 hard-float VFP ABI
>>>
>>>
>>> If I add
>>> /* { dg-require-effective-target arm_hard_ok } */
>>> ‘make check’ with --target-board=-march=armv5t/-mthumb is now
>>> UNSUPPORTED (which is OK), but
>>> plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects
>>> that we lack the -mfloat-abi=hard multilib. So we lose a PASS.
>>>
>>> If I configure GCC for arm-linux-gnueabihf, then:
>>> ‘make check’ PASSes
>>> ‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs
>>> and with
>>> /* { dg-require-effective-target arm_hard_ok } */
>>> ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED 
>>> and
>>> plain ‘make check’ PASSes
>>>
>>> So it seems the best option is to add
>>> /* { dg-require-effective-target arm_hard_ok } */
>>> although it makes the test UNSUPPORTED by arm-linux-gnueabi even in
>>> cases where it could PASS.
>>>
>>> Is there consensus that this is the right way?
>>>
>>>
>>>
>>> * In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and
>>> -march=XXX are independent in general, meaning if you query for
>>> -mfloat-abi=hard support, it will do that in the absence of any
>>> -march=XXX that the testcase may also be using. So, if GCC is
>>> configured with its default cpu/fpu, -mfloat-abi=hard will be rejected
>>> for lack of an fpu on the default cpu, but if GCC is configured with a
>>> suitable cpu/fpu pair, -mfloat-abi=hard will be accepted.
>>>
>>> I faced this problem when I tried to “fix” the order in which we try 
>>> options in
>>> Arm_v8_2a_bf16_neon_ok. (see
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html)
>>>
>>> I faced similar problems while working on a patch of mine about a bug
>>> with IRQ handlers which has different behaviour depending on the FP
>>> ABI used: I have the feeling that I spend too much time writing the
>>> tests to the detriment of the patch itself...
>>>
>>> I also noticed that Richard Sandiford probably faced similar issues
>>> with his recent fix for "no_unique_address", where he finally added
>>> arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU +
>>> float-abi=hard at the same time.
>>>
>>> Maybe we could decide on a consistent and simpler way of checking such 
>>> things?
>>>
>>>
>>> * A metric for this complexity could be the number of arm
>>> effective-targets, a quick and not-fully accurate grep | sed | sort |
>>> uniq -c | sort -n on target-supports.exp ends with:
>>>      9 mips
>>>      16 aarch64
>>>      21 powerpc
>>>      97 vect
>>>     106 arm
>>> (does not count all the effective-targets generated by tcl code, eg
>>> arm_arch_FUNC_ok)
>>>
>>> This probably explains why it’s hard to get test directives right :-)
>>>
>>> I’ve not thought about how we could reduce that number….
>>>
>>>
>>>
>>> * Finally, I’m wondering about the most appropriate way of configuring
>>> GCC and running the tests.
>>>
>>> So far, for most of the configurations I'm testing, I use different
>>> --with-cpu/--with-fpu/--with-mode configure flags for each toolchain
>>> configuration I’m testing and rarely override the flags at testing
>>> time. I also disable multilibs to save build time and (scratch) disk
>>> space. (See 
>>> https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
>>> for the current list, each line corresponds to a clean build + make
>>> check job -- so there are 15 different toolchain configs for
>>> arm-linux-gnueabihf for instance)
>>>
>>> However, I think this is may not be appropriate at least for the
>>> arm-eabi toolchains, because I suspect the vendors who support several
>>> SoCs generally ship one binary toolchain built with the default
>>> cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile),
>>> and the associated IDE adds the right -mcpu/-mfpu flags (see
>>> arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that
>>> the "appropriate" way of testing such a toolchain is to build it with
>>> the default settings and appropriate multilibs and add the needed
>>> -mcpu/-mfpu variants at 'make check' time.
>>>
>>> I would still build one toolchain per configuration I want to test and
>>> not use runtest’s capability to iterate over several combinations:
>>> this way I can run the tests in parallel and reduce the total time
>>> needed to get the results.
>>>
>>> One can compare the results of both options with the two lines with
>>> cortex-m33 in the above table (target arm-none-eabi).
>>>
>>> In the first one, GCC is configured for cortex-m33, and tests executed
>>> via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space
>>> 14GB)
>>>
>>> In the 2nd line, GCC is configured with the default cpu/fpu, multilibs
>>> enabled and I use test flags suitable for cortex-m33: now only 73
>>> failures for gcc. (duration ~3h15, disk space 26GB). Note that there
>>> are more failures for g++ and libstdc++ than for the previous line, I
>>> haven’t fully checked why -- for libstdc++ there are spurious
>>> -march=armv8-m.main+fp flags in the log. So this is not the magic
>>> bullet.
>>>
>>>
>>> Unfortunately, this means every test with arm_hard_ok effective target
>>> would be unsupported (lack of fpu on default cpu) whatever the
>>> validation cflags. The increased build time (many multilibs built for
>>> nothing) will also reduce the validation bandwidth (I hope the
>>> increased scratch disk space will not be a problem with my IT…)
>>>
>>>
>>>
>>> OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors
>>> probably prefer to tune them for their preferred default CPU. For
>>> instance I have an arm board running Ubuntu with gcc-5.4 configured
>>> --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
>>> --with-mode=thumb.
>>>
>>> If this is right, it would mean I should keep the configurations I
>>> currently use for arm-linux* (no multilib, rely on default cpu/fpu).
>>>
>>> ** Regarding the flags used for testing, I’m also wondering what’s the
>>> most appropriate: -mcpu or -march. Both have probably pros and cons?
>>>
>>> In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I
>>> described a problem where it seems that one expects the tests to run
>>> with -march=XXX.
>>>
>>> Another log of mine has an effective-target helper compiled with:
>>> -mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp
>>> -mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb
>>> which produces this error:
>>> cc1: warning: switch '-mcpu=cortex-m33' conflicts with
>>> '-march=armv8.1-m.main' switch
>>> which looks suspicious: running the tests in multiple ways surely
>>> helps uncovering bugs….
>>>
>>>
>>> In summary, I’d like to gather opinions on:
>>> * appropriate usage of dg-require-effective-target arm_hard_ok
>>> * how to improve float-abi support detection in combination with
>>> architecture level
>>> * hopefully consensus on choosing how to configure the toolchain and
>>> run the tests. I’m suggesting default config + multilibs +
>>> runtest-flags for arm-eabi and a selection of default cpu/fpu + less
>>> runtest-flags for arm-linux*.
>>>
>>>
>>> Thanks for reading that far :-)
>>>
>>>
>>> Christophe
>>>
>>
> 
> Thanks för your anwer.
> 
> 
>> I've been pondering this for some time now (well before you sent your mail).
>>
>> My feeling is that trying to control this via dejagnu options is just
>> getting too fiddly.  Perhaps a new approach is called for.
>>
>> My thoughts are along the line of reworking the tests to use
>>
>>   #pragma target <option>
>>
>> etc (or the attribute equivalent), to set the compilation state to
>> something appropriate for the test so that the output is reasonable for
>> that and then we can stabilize the test.
>>
>> It only works for assembly tests, not for anything that requires linking
>> or execution: but for those tests we shouldn't be looking for a specific
>> output but a specific behaviour and we can tolerate more variation in
>> the instructions that implement that behaviour (hybrid tests would need
>> splitting).
> 
> I'm not sure to fully understand what you mean: if we add #pragma CPU XXX
> to a test for instance, and then run the tests with -mcpu=YYY, then
> the test will still be compiled for XXX, right?
> How would we detect that the generated code is wrong if compiling for YYY?
>


That's a separate test.  You either accept what's on the command line
for the multilib, or you have a test that essentially ignores the
command-line options (but is a compile-to-asm only test).  You can't
have it both ways without the mess we have now.

>>
>> It's a fair amount of work, though, since many of the required options
>> cannot be controlled today via the attributes.  It's also not entirely
> Indeed!
> 
> Not to mention that we would also have to decorate the many existing tests.
> 
>> clear whether these should be exposed to users, since in most cases such
>> control is unlikely to be of use in real code.
> Probably indeed.
> 
> For the record, I've changed the way I run the validations for
> arm-eabi as I described in my original email:
> I now use the default cpu/fpu/mode at GCC configure time, enable the
> relevant multilibs then override the compilation flags when running
> the tests.
> 
> For instance: -mthumb/-mcpu=cortex-m33/-mfloat-abi=hard
> 
> The number of failures is now lower than it used to be when
> configuring --with-cpu=cortex-m33.
> 
> Christophe
> 

R.

Re: [arm] GCC validation: preferred way of running the testsuite?

Reply via email to