Erick Erickson created SOLR-12016:
-------------------------------------
Summary: Reduce noise from flakey tests
Key: SOLR-12016
URL: https://issues.apache.org/jira/browse/SOLR-12016
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Components: Tests
Affects Versions: 7.2, master (8.0)
Reporter: Erick Erickson
Assignee: Erick Erickson
We had a discussion of this topic on the dev list, look for a thread titled:
"Test failures are out of control.....". I'll try to summarize that discussion
here and we can move this JIRA forward. This may become an umbrella issue.
Current situation concerns:
> There is so much noise from flakey tests (particularly Solr tests) that they
> are difficult to use.
> The number of tests that regularly fail is increasing
> Failures are being ignored
> The number of failing tests makes releasing more difficult.
> The number of failing tests make it harder to determine whether recent
> changes actually caused problems. Running the tests again until they succeed
> is used commonly at present, which is not robust.
> e-mail notifications of failing tests are largely being ignored.
Propsal:
> Mark all currently "flakey" tests as BadApple or AwaitsFix
> Run Jenkins jobs with BadApple (and/or AwaitsFix) enabled and disabled.
> Frequency TBD, depends partly on whether we can label emails from these runs
> for easy filtering of the two flavors.
>> Label these runs with something suitable in the subject line (wish list)
> Weekly reports on the tests labeled BadApple or AwaitsFix
>> Perhaps this could be incorporated in the reports linked below (wish list)
> Committers should enable BadApple (or AwaitsFix) regularly as a sanity check.
> Leave these as defaults.
> We start getting _much_ more aggressive about not allowing _new_ flakey tests.
NOTE: It's perfectly acceptable to have failing flakey tests as long as someone
is activey working on _fixing_ them.
Concerns with solution
> Decreases test coverage
> Decreases visibility of flakey tests, making fixing them less likely.
> Some tools (see below) that report on bad tests will not see tests that are
> annotated with BadApple or AwaitsFix.
> Running unit tests and reporting errors are being conflated
To be decided:
> Can we label e-mails with failing tests with something in the subject line
> identifying whether they were run with BadApple/Awaits fix enabled or
> disabled? Can someone volunteer?
> Is there any difference between BadApple and AwaitsFix? If not should we
> deprecate one? I propose we just use AwaitsFix and deprecate BadApple.
> Can the automated reports (see below) be enhanced to also report tests
> labeled BadApple or AwaitsFix?
Useful tools:
> Steve Rowe's work on a Jenkins job to reproduce test failures (LUCENE-8106)
> Hoss has worked on aggregating all test failures from the 3 Jenkins systems
> (ASF, Policeman, and Steve's), downloading the test results & logs, and
> running some reports/stats on failures.
>> http://fucit.org/solr-jenkins-reports/
>> https://github.com/hossman/jenkins-reports/
>> http://fucit.org/solr-jenkins-reports/failure-report.html
I've assigned this JIRA to myslef, but all volunteers welcome, especially
anything that changes the build system.....
I've decided to make this a SOLR jira on the theory that most of the offending
tests are in the Solr hive, any sub-tasks for touching the build system can go
under LUCENE if wanted.
Also, I expect to add the annotation to some more tests for a few days as
infrequent failures occur. Once we have stability (defined by there being
little noise) that'll stop.
3 BadApple 23 AwaitsFix annotations are currently in the code, linked to these
issues:
HADOOP-14044
HADOOP-9893
LUCENE-3869
LUCENE-5575")
LUCENE-5595
LUCENE-5737
LUCENE-6709
LUCENE-7161
SOLR-2715
SOLR-6213
SOLR-6443
SOLR-6944
SOLR-7736
SOLR-9036
SOLR-10071
SOLR-10107
SOLR-10136
SOLR-10734
SOLR-10191
SOLR-11134
SOLR-11458
SOLR-11714
SOLR-11974
Solr JIRAS about bad tests
SOLR-2175
SOLR-4147
SOLR-5880
SOLR-6423
SOLR-6944
SOLR-6961
SOLR-6974
SOLR-8122
SOLR-8182
SOLR-9869
SOLR-10053
SOLR-10070
SOLR-10071
SOLR-10139
SOLR-10287
SOLR-10815
SOLR-11911
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]