On Wed, 7 Sep 2011, Diego Novillo wrote:

> One of the most vexing aspects of GCC development is dealing with
> failures in the various testsuites.  In general, we are unable to
> keep failures down to zero.  We tolerate some failures and tell
> people to "compare your build against a clean build".
> 
> This forces developers to either double their testing time by
> building the compiler twice or search in gcc-testresults and hope
> to find a relatively similar build to compare against.

I don't think you can sensibly avoid needing to build the compiler twice.  
Even if the expected state was no failures yesterday, during development 
Stage 1 it's quite likely a combination of patches committed then have 
changes the expected state.  Though regression testers such as HJ's 
certainly help in identifying such new failures promptly and we could 
certainly use more such testers on more targets (but they do need a person 
monitoring them and filing PRs).

> Additionally, the marking mechanisms in DejaGNU are generally
> cumbersome and hard to add.  Even worse, depending on the
> controlling script, there may not be an XFAIL marker at all.

Actually, I think they work well in GCC, given the work Janis did some 
years ago to allow precise specification of the conditions of XFAILing, 
effective-target names, etc. - especially when you are doing non-multilib 
testing (for multilib testing, core DejaGNU can get in the way because 
the multilib options come *after* those in dg-options on the command 
line, so complicating XFAILing).

The most obvious oddity is that gcc.c-torture/execute uses separate .x 
files instead of the dg- harness (see PR 20567).

To my mind, the point of an on-the-side mechanism for identifying known 
failures, separate from the in-test XFAILs, is for failures that depend on 
some machine-specific aspect of the test environment (e.g. the amount of 
memory on the target, or the amount of stack space on the host) - that is, 
for information it would not be appropriate to check in.  If the 
conditions of the failure are well-enough characterised to check in 
something saying when the failure is known, then that something can be 
represented as an XFAIL rather than having two different ways to represent 
it.

> - Supports flaky tests.

Flaky tests are a problem (including for regression testers identifying 
regressions and filing PRs); I'm inclined to think that if a test is flaky 
for non-machine-specific reasons, it should be fixed or promptly disabled 
by default (with a PR filed about the flakiness), rather than being left 
active in a flaky state.  There could be a GCC_TEST_RUN_FLAKY environment 
variable to enable running such tests to see if they have stopped being 
flaky.

-- 
Joseph S. Myers
jos...@codesourcery.com

Reply via email to