On Wed, 7 Sep 2011, Diego Novillo wrote: > One of the most vexing aspects of GCC development is dealing with > failures in the various testsuites. In general, we are unable to > keep failures down to zero. We tolerate some failures and tell > people to "compare your build against a clean build". > > This forces developers to either double their testing time by > building the compiler twice or search in gcc-testresults and hope > to find a relatively similar build to compare against.
I don't think you can sensibly avoid needing to build the compiler twice. Even if the expected state was no failures yesterday, during development Stage 1 it's quite likely a combination of patches committed then have changes the expected state. Though regression testers such as HJ's certainly help in identifying such new failures promptly and we could certainly use more such testers on more targets (but they do need a person monitoring them and filing PRs). > Additionally, the marking mechanisms in DejaGNU are generally > cumbersome and hard to add. Even worse, depending on the > controlling script, there may not be an XFAIL marker at all. Actually, I think they work well in GCC, given the work Janis did some years ago to allow precise specification of the conditions of XFAILing, effective-target names, etc. - especially when you are doing non-multilib testing (for multilib testing, core DejaGNU can get in the way because the multilib options come *after* those in dg-options on the command line, so complicating XFAILing). The most obvious oddity is that gcc.c-torture/execute uses separate .x files instead of the dg- harness (see PR 20567). To my mind, the point of an on-the-side mechanism for identifying known failures, separate from the in-test XFAILs, is for failures that depend on some machine-specific aspect of the test environment (e.g. the amount of memory on the target, or the amount of stack space on the host) - that is, for information it would not be appropriate to check in. If the conditions of the failure are well-enough characterised to check in something saying when the failure is known, then that something can be represented as an XFAIL rather than having two different ways to represent it. > - Supports flaky tests. Flaky tests are a problem (including for regression testers identifying regressions and filing PRs); I'm inclined to think that if a test is flaky for non-machine-specific reasons, it should be fixed or promptly disabled by default (with a PR filed about the flakiness), rather than being left active in a flaky state. There could be a GCC_TEST_RUN_FLAKY environment variable to enable running such tests to see if they have stopped being flaky. -- Joseph S. Myers jos...@codesourcery.com