HI Ted, thanks a lot, your suggestion is well taken! Hi All,
I created HADOOP-11045 and uploaded the tool script. I hope you find it useful, thanks for reviewing and providing feedback. Best regards. --Yongjun On Sun, Aug 31, 2014 at 12:27 AM, Ted Yu <yuzhih...@gmail.com> wrote: > How about putting this tool in dev-support directory ? > > Thanks > > On Aug 30, 2014, at 11:10 PM, Yongjun Zhang <yzh...@cloudera.com> wrote: > > > Hi, > > > > I developed a tool to detect flaky tests of hadoop jenkins test jobs, on > > top of the initial work Todd Lipcon did. We find it quite useful, and > with > > Todd's agreement, I'd like to push it to upstream so all of us can share > > (thanks Todd for the initial work and support). I hope you find the tool > > useful. > > > > This is a tool for hadoop contributors rather than hadoop users. And it > can > > certainly be adapted to projects other than hadoop. I wonder where would > be > > a good place to put it. Your advice is very much appreciated. > > > > Please see below the description and example output of the tool. > > > > Thanks a lot. > > > > --Yongjun > > > > Description of the tool: > > > > # > > # Given a jenkins test job, this script examines all runs of the job done > > # within specified period of time (number of days prior to the execution > > # time of this script), and reports all failed tests. > > # > > # The output of this script includes a section for each run that has > failed > > # tests, with each failed test name listed. > > # > > # More importantly, at the end, it outputs a summary section to list all > > failed > > # tests within all examined runs, and indicate how many runs a same test > > # failed, and sorted all failed tests by how many runs each test failed > in. > > # > > # This way, when we see failed tests in PreCommit build, we can quickly > > tell > > # whether a failed test is a new failure or it failed before, and it may > > just > > # be a flaky test. > > # > > # Of course, to be 100% sure about the reason of a failued test, closer > > look > > # at the failed test for the specific run is necessary. > > # > > > > Example usage and output of the tool for job Hadoop-Common-0.23-Build, > > which indicates that the same test failed five times in a row: > > > > ./determine-flaky-tests-hadoop.py -j Hadoop-Common-0.23-Build > > ****Recently FAILED builds in url: > > https://builds.apache.org//job/Hadoop-Common-0.23-Build > > THERE ARE 5 builds (out of 5) that have failed tests in the past 14 > > days, as listed below: > > > > ==> > https://builds.apache.org/job/Hadoop-Common-0.23-Build/1057/testReport > > (2014-08-30 02:01:30) > > Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec > > ==> > https://builds.apache.org/job/Hadoop-Common-0.23-Build/1056/testReport > > (2014-08-29 02:01:30) > > Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec > > ==> > https://builds.apache.org/job/Hadoop-Common-0.23-Build/1055/testReport > > (2014-08-28 02:01:30) > > Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec > > ==> > https://builds.apache.org/job/Hadoop-Common-0.23-Build/1054/testReport > > (2014-08-27 02:01:29) > > Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec > > ==> > https://builds.apache.org/job/Hadoop-Common-0.23-Build/1053/testReport > > (2014-08-26 02:01:30) > > Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec > > > > All failed tests <#occurrences: testName>: > > 5: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec > > > > > > Another example (for job Hadoop-Hdfs-trunk): > > > > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -n 7 > > ****Recently FAILED builds in url: > > https://builds.apache.org//job/Hadoop-Hdfs-trunk > > THERE ARE 7 builds (out of 8) that have failed tests in the past 7 > > days, as listed below: > > > > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1856/testReport > > (2014-08-30 09:46:54) > > Failed test: > > > org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose > > Failed test: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation > > Failed test: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure > > Failed test: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode > > Failed test: > > > org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks > > Failed test: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout > > Failed test: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum > > Failed test: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart > > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1855/testReport > > (2014-08-30 04:31:30) > > Failed test: > > > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer > > Failed test: > > > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution > > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/testReport > > (2014-08-29 04:31:30) > > Could not open testReport > > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1853/testReport > > (2014-08-28 09:37:18) > > Could not open testReport > > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1852/testReport > > (2014-08-28 09:28:48) > > Could not open testReport > > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1850/testReport > > (2014-08-27 04:31:30) > > Failed test: > > > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End > > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1849/testReport > > (2014-08-26 04:31:29) > > Failed test: > > > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity > > > > All failed tests <#occurrences: testName>: > > 1: > > > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity > > 1: > > > org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose > > 1: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation > > 1: > > > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End > > 1: > > > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution > > 1: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure > > 1: > > > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer > > 1: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode > > 1: > > > org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks > > 1: > > org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout > > 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum > > 1: org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart > > [yzhang@localhost jenkinsftf]$ > > > > > > > > On Thu, Aug 28, 2014 at 8:04 PM, Yongjun Zhang <yzh...@cloudera.com> > wrote: > > > >> Hi, > >> > >> I just noticed that the recent jenkin test report doesn't include link > to > >> test result, however, the email notice does show the failed tests: > >> > >> E.g. > >> > >> https://builds.apache.org/job/PreCommit-HDFS-Build/7846// > >> > >> Example old job report that has the link: > >> > >> https://builds.apache.org/job/PreCommit-HDFS-Build/7590/ > >> > >> Would any one please take a look? > >> > >> Thanks a lot. > >> > >> --Yongjun > >> > >> On Thu, Aug 28, 2014 at 4:21 PM, Karthik Kambatla <ka...@cloudera.com> > >> wrote: > >> > >>> Thanks Giri and Ted for fixing the builds. > >>> > >>> > >>> On Thu, Aug 28, 2014 at 9:49 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >>> > >>>> Charles: > >>>> QA build is running for your JIRA: > >>>> https://builds.apache.org/job/PreCommit-hdfs-Build/7828/parameters/ > >>>> > >>>> Cheers > >>>> > >>>> > >>>> On Thu, Aug 28, 2014 at 9:41 AM, Charles Lamb <cl...@cloudera.com> > >>> wrote: > >>>> > >>>>> On 8/28/2014 12:07 PM, Giridharan Kesavan wrote: > >>>>> > >>>>>> Fixed all the 3 pre-commit buids. test-patch's git reset --hard is > >>>>>> removing > >>>>>> the patchprocess dir, so moved it off the workspace. > >>>>> Thanks Giri. Should I resubmit HDFS-6954's patch? I've gotten 3 or 4 > >>>>> jenkins messages that indicated the problem so something is > >>> resubmitting, > >>>>> but now that you've fixed it, should I resubmit it again? > >>>>> > >>>>> Charles > >> > >> >