On 11 October 2017 at 08:34, Paulo Matos <pmatos@linki.tools> wrote: > > > On 10/10/17 23:25, Joseph Myers wrote: >> On Tue, 10 Oct 2017, Paulo Matos wrote: >> >>> new test -> FAIL ; New test starts as fail >> >> No, that's not a regression, but you might want to treat it as one (in the >> sense that it's a regression at the higher level of "testsuite run should >> have no unexpected failures", even if the test in question would have >> failed all along if added earlier and so the underlying compiler bug, if >> any, is not a regression). It should have human attention to classify it >> and either fix the test or XFAIL it (with issue filed in Bugzilla if a >> bug), but it's not a regression. (Exception: where a test failing results >> in its name changing, e.g. through adding "(internal compiler error)".) >> > > When someone adds a new test to the testsuite, isn't it supposed to not > FAIL? If is does FAIL, shouldn't this be considered a regression? > > Now, the danger is that since regressions are comparisons with previous > run something like this would happen: > > run1: > ... > FAIL: foo.c ; new test > ... > > run1 fails because new test entered as a FAIL > > run2: > ... > FAIL: foo.c > ... > > run2 succeeds because there are no changes. > > For this reason all of this issues need to be taken care straight away > or they become part of the 'normal' status and no more failures are > issued... unless of course a more complex regression analysis is > implemented. > Agreed.
> Also, when I mean, run1 fails or succeeds this is just the term I use to > display red/green in the buildbot interface for a given build, not > necessarily what I expect the process will do. > >> >> My suggestion is: >> >> PASS -> FAIL is an unambiguous regression. >> >> Anything else -> FAIL and new FAILing tests aren't regressions at the >> individual test level, but may be treated as such at the whole testsuite >> level. >> >> Any transition where the destination result is not FAIL is not a >> regression. >> FWIW, we consider regressions: * any->FAIL because we don't want such a regression at the whole testsuite level * any->UNRESOLVED for the same reason * {PASS,UNSUPPORTED,UNTESTED,UNRESOLVED}-> XPASS * new XPASS * XFAIL disappears (may mean that a testcase was removed, worth a manual check) * ERRORS >> ERRORs in the .sum or .log files should be watched out for as well, >> however, as sometimes they may indicate broken Tcl syntax in the >> testsuite, which may cause many tests not to be run. >> >> Note that the test names that come after PASS:, FAIL: etc. aren't unique >> between different .sum files, so you need to associate tests with a tuple >> (.sum file, test name) (and even then, sometimes multiple tests in a .sum >> file have the same name, but that's a testsuite bug). If you're using >> --target_board options that run tests for more than one multilib in the >> same testsuite run, add the multilib to that tuple as well. >> > > Thanks for all the comments. Sounds sensible. > By not being unique, you mean between languages? Yes, but not only as Joseph mentioned above. You have the obvious example of c-c++-common/*san tests, which are common to gcc and g++. > I assume that two gcc.sum from different builds will always refer to the > same test/configuration when referring to (for example): > PASS: gcc.c-torture/compile/20000105-1.c -O1 (test for excess errors) > > In this case, I assume that "gcc.c-torture/compile/20000105-1.c -O1 > (test for excess errors)" will always be referring to the same thing. > In gcc.sum, I can see 4 occurrences of PASS: gcc.dg/Werror-13.c (test for errors, line ) Actually, there are quite a few others like that.... Christophe > -- > Paulo Matos