Hi, I developed a tool to detect flaky tests of hadoop jenkins test jobs, on top of the initial work Todd Lipcon did. We find it quite useful, and with Todd's agreement, I'd like to push it to upstream so all of us can share (thanks Todd for the initial work and support). I hope you find the tool useful.
This is a tool for hadoop contributors rather than hadoop users. And it can certainly be adapted to projects other than hadoop. I wonder where would be a good place to put it. Your advice is very much appreciated. Please see below the description and example output of the tool. Thanks a lot. --Yongjun Description of the tool: # # Given a jenkins test job, this script examines all runs of the job done # within specified period of time (number of days prior to the execution # time of this script), and reports all failed tests. # # The output of this script includes a section for each run that has failed # tests, with each failed test name listed. # # More importantly, at the end, it outputs a summary section to list all failed # tests within all examined runs, and indicate how many runs a same test # failed, and sorted all failed tests by how many runs each test failed in. # # This way, when we see failed tests in PreCommit build, we can quickly tell # whether a failed test is a new failure or it failed before, and it may just # be a flaky test. # # Of course, to be 100% sure about the reason of a failued test, closer look # at the failed test for the specific run is necessary. # Example usage and output of the tool for job Hadoop-Common-0.23-Build, which indicates that the same test failed five times in a row: ./determine-flaky-tests-hadoop.py -j Hadoop-Common-0.23-Build ****Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Common-0.23-Build THERE ARE 5 builds (out of 5) that have failed tests in the past 14 days, as listed below: ==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1057/testReport (2014-08-30 02:01:30) Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec ==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1056/testReport (2014-08-29 02:01:30) Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec ==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1055/testReport (2014-08-28 02:01:30) Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec ==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1054/testReport (2014-08-27 02:01:29) Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec ==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1053/testReport (2014-08-26 02:01:30) Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec All failed tests <#occurrences: testName>: 5: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec Another example (for job Hadoop-Hdfs-trunk): [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -n 7 ****Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 7 builds (out of 8) that have failed tests in the past 7 days, as listed below: ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1856/testReport (2014-08-30 09:46:54) Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum Failed test: org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1855/testReport (2014-08-30 04:31:30) Failed test: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer Failed test: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/testReport (2014-08-29 04:31:30) Could not open testReport ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1853/testReport (2014-08-28 09:37:18) Could not open testReport ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1852/testReport (2014-08-28 09:28:48) Could not open testReport ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1850/testReport (2014-08-27 04:31:30) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1849/testReport (2014-08-26 04:31:29) Failed test: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity All failed tests <#occurrences: testName>: 1: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation 1: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End 1: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure 1: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart [yzhang@localhost jenkinsftf]$ On Thu, Aug 28, 2014 at 8:04 PM, Yongjun Zhang <yzh...@cloudera.com> wrote: > Hi, > > I just noticed that the recent jenkin test report doesn't include link to > test result, however, the email notice does show the failed tests: > > E.g. > > https://builds.apache.org/job/PreCommit-HDFS-Build/7846// > > Example old job report that has the link: > > https://builds.apache.org/job/PreCommit-HDFS-Build/7590/ > > Would any one please take a look? > > Thanks a lot. > > --Yongjun > > On Thu, Aug 28, 2014 at 4:21 PM, Karthik Kambatla <ka...@cloudera.com> > wrote: > >> Thanks Giri and Ted for fixing the builds. >> >> >> On Thu, Aug 28, 2014 at 9:49 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> > Charles: >> > QA build is running for your JIRA: >> > https://builds.apache.org/job/PreCommit-hdfs-Build/7828/parameters/ >> > >> > Cheers >> > >> > >> > On Thu, Aug 28, 2014 at 9:41 AM, Charles Lamb <cl...@cloudera.com> >> wrote: >> > >> > > On 8/28/2014 12:07 PM, Giridharan Kesavan wrote: >> > > >> > >> Fixed all the 3 pre-commit buids. test-patch's git reset --hard is >> > >> removing >> > >> the patchprocess dir, so moved it off the workspace. >> > >> >> > >> >> > > Thanks Giri. Should I resubmit HDFS-6954's patch? I've gotten 3 or 4 >> > > jenkins messages that indicated the problem so something is >> resubmitting, >> > > but now that you've fixed it, should I resubmit it again? >> > > >> > > Charles >> > > >> > > >> > >> > >