HI Ted, thanks a lot, your suggestion is well taken!

Hi All,

I created HADOOP-11045 and uploaded the tool script. I hope you find it
useful, thanks for reviewing and providing feedback.

Best regards.

--Yongjun

On Sun, Aug 31, 2014 at 12:27 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> How about putting this tool in dev-support directory ?
>
> Thanks
>
> On Aug 30, 2014, at 11:10 PM, Yongjun Zhang <yzh...@cloudera.com> wrote:
>
> > Hi,
> >
> > I developed a tool to detect flaky tests of hadoop jenkins test jobs, on
> > top of the initial work Todd Lipcon did. We find it quite useful, and
> with
> > Todd's agreement, I'd like to push it to upstream so all of us can share
> > (thanks Todd for the initial work and support). I hope you find the tool
> > useful.
> >
> > This is a tool for hadoop contributors rather than hadoop users. And it
> can
> > certainly be adapted to projects other than hadoop. I wonder where would
> be
> > a good place to put it.  Your advice is very much appreciated.
> >
> > Please see below the description and example output of the tool.
> >
> > Thanks a lot.
> >
> > --Yongjun
> >
> > Description of the tool:
> >
> > #
> > # Given a jenkins test job, this script examines all runs of the job done
> > # within specified period of time (number of days prior to the execution
> > # time of this script), and reports all failed tests.
> > #
> > # The output of this script includes a section for each run that has
> failed
> > # tests, with each failed test name listed.
> > #
> > # More importantly, at the end, it outputs a summary section to list all
> > failed
> > # tests within all examined runs, and indicate how many runs a same test
> > # failed, and sorted all failed tests by how many runs each test failed
> in.
> > #
> > # This way, when we see failed tests in PreCommit build, we can quickly
> > tell
> > # whether a failed test is a new failure or it failed before, and it may
> > just
> > # be a flaky test.
> > #
> > # Of course, to be 100% sure about the reason of a failued test, closer
> > look
> > # at the failed test for the specific run is necessary.
> > #
> >
> > Example usage and output of the tool for job Hadoop-Common-0.23-Build,
> > which indicates that the same test failed five times in a row:
> >
> > ./determine-flaky-tests-hadoop.py -j Hadoop-Common-0.23-Build
> > ****Recently FAILED builds in url:
> > https://builds.apache.org//job/Hadoop-Common-0.23-Build
> >    THERE ARE 5 builds (out of 5) that have failed tests in the past 14
> > days, as listed below:
> >
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1057/testReport
> > (2014-08-30 02:01:30)
> >    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1056/testReport
> > (2014-08-29 02:01:30)
> >    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1055/testReport
> > (2014-08-28 02:01:30)
> >    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1054/testReport
> > (2014-08-27 02:01:29)
> >    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> > ==>
> https://builds.apache.org/job/Hadoop-Common-0.23-Build/1053/testReport
> > (2014-08-26 02:01:30)
> >    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> >
> > All failed tests <#occurrences: testName>:
> >    5: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
> >
> >
> > Another example (for job Hadoop-Hdfs-trunk):
> >
> > [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -n 7
> > ****Recently FAILED builds in url:
> > https://builds.apache.org//job/Hadoop-Hdfs-trunk
> >    THERE ARE 7 builds (out of 8) that have failed tests in the past 7
> > days, as listed below:
> >
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1856/testReport
> > (2014-08-30 09:46:54)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
> >    Failed test:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
> >    Failed test:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1855/testReport
> > (2014-08-30 04:31:30)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/testReport
> > (2014-08-29 04:31:30)
> >   Could not open testReport
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1853/testReport
> > (2014-08-28 09:37:18)
> >   Could not open testReport
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1852/testReport
> > (2014-08-28 09:28:48)
> >   Could not open testReport
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1850/testReport
> > (2014-08-27 04:31:30)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End
> > ==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1849/testReport
> > (2014-08-26 04:31:29)
> >    Failed test:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity
> >
> > All failed tests <#occurrences: testName>:
> >    1:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity
> >    1:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
> >    1:
> >
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End
> >    1:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
> >    1:
> >
> org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
> >    1:
> >
> org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks
> >    1:
> > org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
> >    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
> >    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
> > [yzhang@localhost jenkinsftf]$
> >
> >
> >
> > On Thu, Aug 28, 2014 at 8:04 PM, Yongjun Zhang <yzh...@cloudera.com>
> wrote:
> >
> >> Hi,
> >>
> >> I just noticed that the recent jenkin test report doesn't include link
> to
> >> test result, however, the email notice does show the failed tests:
> >>
> >> E.g.
> >>
> >> https://builds.apache.org/job/PreCommit-HDFS-Build/7846//
> >>
> >> Example old job report that has the link:
> >>
> >> https://builds.apache.org/job/PreCommit-HDFS-Build/7590/
> >>
> >> Would any one please take a look?
> >>
> >> Thanks a lot.
> >>
> >> --Yongjun
> >>
> >> On Thu, Aug 28, 2014 at 4:21 PM, Karthik Kambatla <ka...@cloudera.com>
> >> wrote:
> >>
> >>> Thanks Giri and Ted for fixing the builds.
> >>>
> >>>
> >>> On Thu, Aug 28, 2014 at 9:49 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> >>>
> >>>> Charles:
> >>>> QA build is running for your JIRA:
> >>>> https://builds.apache.org/job/PreCommit-hdfs-Build/7828/parameters/
> >>>>
> >>>> Cheers
> >>>>
> >>>>
> >>>> On Thu, Aug 28, 2014 at 9:41 AM, Charles Lamb <cl...@cloudera.com>
> >>> wrote:
> >>>>
> >>>>> On 8/28/2014 12:07 PM, Giridharan Kesavan wrote:
> >>>>>
> >>>>>> Fixed all the 3 pre-commit buids. test-patch's git reset --hard is
> >>>>>> removing
> >>>>>> the patchprocess dir, so moved it off the workspace.
> >>>>> Thanks Giri. Should I resubmit HDFS-6954's patch? I've gotten 3 or 4
> >>>>> jenkins messages that indicated the problem so something is
> >>> resubmitting,
> >>>>> but now that you've fixed it, should I resubmit it again?
> >>>>>
> >>>>> Charles
> >>
> >>
>

Reply via email to