[jira] [Comment Edited] (SOLR-12037) Reduce noise from flakey tests

Yonik Seeley (JIRA) Mon, 05 Mar 2018 09:33:15 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-12037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377995#comment-16377995
 ]


Yonik Seeley edited comment on SOLR-12037 at 3/5/18 5:32 PM:
-------------------------------------------------------------

Rolling up comments from deleted SOLR-2016:

I'll mash all the comments together rather than have a bunch of individual 
comments from SOLR-2016, they're in the original order:

Erick Erickson:
These are JIRAs linked to by current BadApple and AwaitsFix annotations.

-----------------
Robert Muir:
I don't think the lucene tests marked with AwaitsFix belong here at all. Please 
don't remove the annotations for these. They are simply tests that fail due to 
known open bugs in jira. They aren't flaky at all.**

Yonik Seeley:
> Also, I expect to add the annotation to some more tests for a few days as 
> infrequent failures occur.
Per the email discussion, this should happen after the default for "ant test" 
is changed (the one that committers run manually before committing) so that 
regressions will be caught before they are committed.

-----------------
Erick Erickson:
Robert:
I was unclear. The only AwaitsFix annotations I was thinking of removing were 
ones where the linked JIRA has been fixed. I don't think there are any in 
Lucene that fit that criterion anyway...
I'll add annotations to reduce noise though, mostly in Solr. I think there are 
a couple of occasionally-failing tests in Lucene, but I'll be sure to discuss 
those ahead of time before adding any annotations there.

Yonik:
Right, I was thinking that we'd make any changes to the build system first, 
then add any annotations or the like. My guess is there'll be a day or two 
where things might be wonky but I'll be sure to let people know.
In fact, One possible outcome here is that the build system change is just to 
add a tag to the subject line of e-mails indicating whether the build was run 
with or without the tests disabled and not change the defaults at all.... That 
decision has to be made first and coordinated with adding annotations.

-----------------
Uwe Schindler:
Hi,
here is the patch for the build system changes: SOLR-12016-buildsystem.patch :
•       Enables @BadApple tests by default. We should now make sure and review 
all current @BadApple tests if they should maybe move to @AwaitsFix.
•       Adds a missing help text in the ant test-help output
•       Adds a new Jenkins job: ant jenkins-badapples
•       No change to @AwaitsFix: Those stay disabled for jenkins and developers 
and the test runner will just print the usual warning. So we should only be 
used for tests that are definitely broken and fail 100% of the time (e.g. cause 
is known). In contrast @BadApple should be used for tests that fail sometimes 
(like <30% of all runs). For tests failing more often it's also good to move 
them to @AwaitsFix, as those make no sense to run. In both cases a JIRA-Issue 
has to be linked.
For Erick Erickson: We should maybe ad a precommit test that complains about 
tests marked with any of those annotations, but the related JIRA issues is 
resolved/closed.

-----------------
Uwe Schindler:
IMHO, the smoke tester should also disable flakey tests, because those should 
not stop releases. I'll add a patch.

Uwe Schindler:
I updated the patch, to disable badapples on smoketester. Maybe somebody 
verifies this change, as I have no way to test it and my knowledge of python is 
zero.

-----------------
Erick Erickson:
Uwe Schindler This is excellent. I'm flip-flopping on whether to go ahead or 
wait until 7.3 is cut. On the one hand it'd be nice to say "As of 7.3, the 
noise is stopped". On the other, there wouldn't be much time for stuff to bake. 
Yonik's comments about reducing test coverage are germane, especially just 
before a release.
On the other-other hand, having the noisy tests silenced would make the 7.3 
release smoother.
Opinions anyone?

-----------------
Uwe Schindler:
I'd like to do the next release without the test noise. You may have noticed 
that I did not help running smoke tester and I did not vote for past releases. 
One of the reasonos was, that I was not able to get smoke tester running at 
all! In addition, since 7.3, we will run Jenkins on both Java 8 and Java 9 to 
make sure it runs with Java 9, too (MR-JARs,...). So the chance that it fails 
is twice as high. So it's now impossible to pass smoke tester. See the jenkins 
logs, we had not even one successful smoker run!
To do a relaese with 7.3 code base, we should get this in first. Thanks, Uwe

-----------------
Erick Erickson:
Uwe Schindler The RMs job (and all volunteers who run smoke tests) is hard 
enough as it is, so I'll start working on this. The initial cut shouldn't take 
long at all given Hoss' reports.
The idea of a precommit test is a good one, but I confess I won't get to it.
I'll fold your build system changes in at the same time of course.


Uwe Schindler I have one suggestion though. Running with BadApple enabled on 
Jenkins all the time still seems like it makes it harder to catch regressions. 
If some % of runs ran with badapples=true and the rest with badapples=false 
then any failures with badapples=false would be cause for reverting the JIRA 
that caused it or fixing immediately. The idea here is to keep more flakey 
tests from creeping in.
It'd be easy enough to set up a mail filter to make failures with 
badapples=false stand out and then we could complain loudly. That would still 
allow Hoss' reports to be valid since there'd also be badapples=true tests for 
them to chew on. In that case we might slightly modify the use of AwaitsFix to 
mean "only tests that have a known cause". Or maybe just run with 
badapples=false nightly. Or??? If/when we work through the current situation, 
we could set badapples=true all the time. WDYT?
I'll see if I can get the first cut ready to rock-n-roll this weekend. After 
that I expect a (hopefully short) period of adding BadApple annotations as 
less-frequently failing tests show up..
All:
Expect a number of JIRAs to change status over the next few days as I 
regularize the current use of BadApple and AwaitsFix. I'll probably raise a new 
JIRA we can point BadApple tests at explaining the situation rather than make 
someone wade through this one. Unless they contain specific information that 
would help someone trying to debug them, I'll point the tests at the new JIRA 
and close the old ones.

-----------------
Uwe Schindler:
I'd setup different jenkins jobs. One with BadApples disabled that runs all the 
time ("ant jenkins-hourly" or "ant jenkins-nightly"). Once per day or once per 
two days we schedule one that runs "ant jenkins-badapples". This would have a 
different subject line so you can filter on them.
Everything else is too complicated from Jenkins point of view. Please, please 
let us NOT run the BadApples on jenkins hourly runs!

-----------------
Erick Erickson:
Whatever works, I'll happily defer to your expertise here.
So I'll get this stuff sorted out and then we'll worry about setting up the 
proper Jenkins jobs, correct?

-----------------
Uwe Schindler:
Yes.

-----------------
Uwe Schindler:
Hi Erick Erickson: I updated the patch. During my merge actions, I missed to 
actually enable @BadApple tests - I just updated all comments, but the main 
change in the @TestGroup annotation was lost. Now it's fine.
I still have not tested the smoker change.




was (Author: erickerickson):
Rolling up comments from deleted SOLR-2016:

I'll mash all the comments together rather than have a bunch of individual 
comments from SOLR-2016, they're in the original order:

Erick Erickson:
These are JIRAs linked to by current BadApple and AwaitsFix annotations.

-----------------
Robert Muir:
I don't think the lucene tests marked with AwaitsFix belong here at all. Please 
don't remove the annotations for these. They are simply tests that fail due to 
known open bugs in jira. They aren't flaky at all.**

Yonik Seeley:
Also, I expect to add the annotation to some more tests for a few days as 
infrequent failures occur.
Per the email discussion, this should happen after the default for "ant test" 
is changed (the one that committers run manually before committing) so that 
regressions will be caught before they are committed.

-----------------
Erick Erickson:
Robert:
I was unclear. The only AwaitsFix annotations I was thinking of removing were 
ones where the linked JIRA has been fixed. I don't think there are any in 
Lucene that fit that criterion anyway...
I'll add annotations to reduce noise though, mostly in Solr. I think there are 
a couple of occasionally-failing tests in Lucene, but I'll be sure to discuss 
those ahead of time before adding any annotations there.

Yonik:
Right, I was thinking that we'd make any changes to the build system first, 
then add any annotations or the like. My guess is there'll be a day or two 
where things might be wonky but I'll be sure to let people know.
In fact, One possible outcome here is that the build system change is just to 
add a tag to the subject line of e-mails indicating whether the build was run 
with or without the tests disabled and not change the defaults at all.... That 
decision has to be made first and coordinated with adding annotations.

-----------------
Uwe Schindler:
Hi,
here is the patch for the build system changes: SOLR-12016-buildsystem.patch :
•       Enables @BadApple tests by default. We should now make sure and review 
all current @BadApple tests if they should maybe move to @AwaitsFix.
•       Adds a missing help text in the ant test-help output
•       Adds a new Jenkins job: ant jenkins-badapples
•       No change to @AwaitsFix: Those stay disabled for jenkins and developers 
and the test runner will just print the usual warning. So we should only be 
used for tests that are definitely broken and fail 100% of the time (e.g. cause 
is known). In contrast @BadApple should be used for tests that fail sometimes 
(like <30% of all runs). For tests failing more often it's also good to move 
them to @AwaitsFix, as those make no sense to run. In both cases a JIRA-Issue 
has to be linked.
For Erick Erickson: We should maybe ad a precommit test that complains about 
tests marked with any of those annotations, but the related JIRA issues is 
resolved/closed.

-----------------
Uwe Schindler:
IMHO, the smoke tester should also disable flakey tests, because those should 
not stop releases. I'll add a patch.

Uwe Schindler:
I updated the patch, to disable badapples on smoketester. Maybe somebody 
verifies this change, as I have no way to test it and my knowledge of python is 
zero.

-----------------
Erick Erickson:
Uwe Schindler This is excellent. I'm flip-flopping on whether to go ahead or 
wait until 7.3 is cut. On the one hand it'd be nice to say "As of 7.3, the 
noise is stopped". On the other, there wouldn't be much time for stuff to bake. 
Yonik's comments about reducing test coverage are germane, especially just 
before a release.
On the other-other hand, having the noisy tests silenced would make the 7.3 
release smoother.
Opinions anyone?

-----------------
Uwe Schindler:
I'd like to do the next release without the test noise. You may have noticed 
that I did not help running smoke tester and I did not vote for past releases. 
One of the reasonos was, that I was not able to get smoke tester running at 
all! In addition, since 7.3, we will run Jenkins on both Java 8 and Java 9 to 
make sure it runs with Java 9, too (MR-JARs,...). So the chance that it fails 
is twice as high. So it's now impossible to pass smoke tester. See the jenkins 
logs, we had not even one successful smoker run!
To do a relaese with 7.3 code base, we should get this in first. Thanks, Uwe

-----------------
Erick Erickson:
Uwe Schindler The RMs job (and all volunteers who run smoke tests) is hard 
enough as it is, so I'll start working on this. The initial cut shouldn't take 
long at all given Hoss' reports.
The idea of a precommit test is a good one, but I confess I won't get to it.
I'll fold your build system changes in at the same time of course.


Uwe Schindler I have one suggestion though. Running with BadApple enabled on 
Jenkins all the time still seems like it makes it harder to catch regressions. 
If some % of runs ran with badapples=true and the rest with badapples=false 
then any failures with badapples=false would be cause for reverting the JIRA 
that caused it or fixing immediately. The idea here is to keep more flakey 
tests from creeping in.
It'd be easy enough to set up a mail filter to make failures with 
badapples=false stand out and then we could complain loudly. That would still 
allow Hoss' reports to be valid since there'd also be badapples=true tests for 
them to chew on. In that case we might slightly modify the use of AwaitsFix to 
mean "only tests that have a known cause". Or maybe just run with 
badapples=false nightly. Or??? If/when we work through the current situation, 
we could set badapples=true all the time. WDYT?
I'll see if I can get the first cut ready to rock-n-roll this weekend. After 
that I expect a (hopefully short) period of adding BadApple annotations as 
less-frequently failing tests show up..
All:
Expect a number of JIRAs to change status over the next few days as I 
regularize the current use of BadApple and AwaitsFix. I'll probably raise a new 
JIRA we can point BadApple tests at explaining the situation rather than make 
someone wade through this one. Unless they contain specific information that 
would help someone trying to debug them, I'll point the tests at the new JIRA 
and close the old ones.

-----------------
Uwe Schindler:
I'd setup different jenkins jobs. One with BadApples disabled that runs all the 
time ("ant jenkins-hourly" or "ant jenkins-nightly"). Once per day or once per 
two days we schedule one that runs "ant jenkins-badapples". This would have a 
different subject line so you can filter on them.
Everything else is too complicated from Jenkins point of view. Please, please 
let us NOT run the BadApples on jenkins hourly runs!

-----------------
Erick Erickson:
Whatever works, I'll happily defer to your expertise here.
So I'll get this stuff sorted out and then we'll worry about setting up the 
proper Jenkins jobs, correct?

-----------------
Uwe Schindler:
Yes.

-----------------
Uwe Schindler:
Hi Erick Erickson: I updated the patch. During my merge actions, I missed to 
actually enable @BadApple tests - I just updated all comments, but the main 
change in the @TestGroup annotation was lost. Now it's fine.
I still have not tested the smoker change.



> Reduce noise from flakey tests
> ------------------------------
>
>                 Key: SOLR-12037
>                 URL: https://issues.apache.org/jira/browse/SOLR-12037
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Tests
>    Affects Versions: 7.2
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> Recreating SOLR-12016. Please do NOT delete this without discussion. NOTE: 
> Uwe's build system modifications originally on 12016 have been incorporated 
> into SOLR-12028.
> Current situation concerns:
> > There is so much noise from flakey tests (particularly Solr tests) that 
> > they are difficult to use.
> > The number of tests that regularly fail is increasing
> > Failures are being ignored
> > The number of failing tests makes releasing more difficult.
> > The number of failing tests make it harder to determine whether recent 
> > changes actually caused problems. Running the tests again until they 
> > succeed is used commonly at present, which is not robust.
> > e-mail notifications of failing tests are largely being ignored.
> Propsal:
> > Mark all currently "flakey" tests as BadApple or AwaitsFix
> > Run Jenkins jobs with BadApple (and/or AwaitsFix) enabled and disabled. 
> > Frequency TBD, depends partly on whether we can label emails from these 
> > runs for easy filtering of the two flavors.
> >> Label these runs with something suitable in the subject line (wish list)
> > Weekly reports on the tests labeled BadApple or AwaitsFix
> >> Perhaps this could be incorporated in the reports linked below (wish list)
> > Committers should enable BadApple (or AwaitsFix) regularly as a sanity 
> > check. Leave these as defaults.
> > We start getting much more aggressive about not allowing new flakey tests.
> NOTE: It's perfectly acceptable to have failing flakey tests as long as 
> someone is activey working on fixing them.
> Concerns with solution
> > Decreases test coverage
> > Decreases visibility of flakey tests, making fixing them less likely.
> > Some tools (see below) that report on bad tests will not see tests that are 
> > annotated with BadApple or AwaitsFix.
> > Running unit tests and reporting errors are being conflated
> To be decided:
> > Can we label e-mails with failing tests with something in the subject line 
> > identifying whether they were run with BadApple/Awaits fix enabled or 
> > disabled? Can someone volunteer?
> > Is there any difference between BadApple and AwaitsFix? If not should we 
> > deprecate one? I propose we just use AwaitsFix and deprecate BadApple.
> > Can the automated reports (see below) be enhanced to also report tests 
> > labeled BadApple or AwaitsFix?
> Useful tools:
> > Steve Rowe's work on a Jenkins job to reproduce test failures (LUCENE-8106) 
> > Hoss has worked on aggregating all test failures from the 3 Jenkins systems 
> > (ASF, Policeman, and Steve's), downloading the test results & logs, and 
> > running some reports/stats on failures.
> >> http://fucit.org/solr-jenkins-reports/
> >> https://github.com/hossman/jenkins-reports/
> >> http://fucit.org/solr-jenkins-reports/failure-report.html
> I've assigned this JIRA to myslef, but all volunteers welcome, especially 
> anything that changes the build system.....
> I've decided to make this a SOLR jira on the theory that most of the 
> offending tests are in the Solr hive, any sub-tasks for touching the build 
> system can go under LUCENE if wanted.
> Also, I expect to add the annotation to some more tests for a few days as 
> infrequent failures occur. Once we have stability (defined by there being 
> little noise) that'll stop.
> 3 BadApple 23 AwaitsFix annotations are currently in the code, linked to 
> these issues:
> HADOOP-9893
> LUCENE-3869
> LUCENE-5575")
> LUCENE-5595
> LUCENE-5737
> LUCENE-6709
> LUCENE-7161
> SOLR-2715
> SOLR-6213
> SOLR-6443
> SOLR-6944
> SOLR-10071
> SOLR-10136
> SOLR-10734
> SOLR-11974
> Solr JIRAS about bad tests
> SOLR-2175
> SOLR-4147
> SOLR-5880
> SOLR-6423
> SOLR-6944
> SOLR-6961
> SOLR-6974
> SOLR-8122
> SOLR-8182
> SOLR-9869
> SOLR-10053
> SOLR-10070
> SOLR-10071
> SOLR-10139
> SOLR-10287
> SOLR-11911



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-12037) Reduce noise from flakey tests

Reply via email to