Hi,

I developed a tool to detect flaky tests of hadoop jenkins test jobs, on
top of the initial work Todd Lipcon did. We find it quite useful, and with
Todd's agreement, I'd like to push it to upstream so all of us can share
(thanks Todd for the initial work and support). I hope you find the tool
useful.

This is a tool for hadoop contributors rather than hadoop users. And it can
certainly be adapted to projects other than hadoop. I wonder where would be
a good place to put it.  Your advice is very much appreciated.

Please see below the description and example output of the tool.

Thanks a lot.

--Yongjun

Description of the tool:

#
# Given a jenkins test job, this script examines all runs of the job done
# within specified period of time (number of days prior to the execution
# time of this script), and reports all failed tests.
#
# The output of this script includes a section for each run that has failed
# tests, with each failed test name listed.
#
# More importantly, at the end, it outputs a summary section to list all
failed
# tests within all examined runs, and indicate how many runs a same test
# failed, and sorted all failed tests by how many runs each test failed in.
#
# This way, when we see failed tests in PreCommit build, we can quickly
tell
# whether a failed test is a new failure or it failed before, and it may
just
# be a flaky test.
#
# Of course, to be 100% sure about the reason of a failued test, closer
look
# at the failed test for the specific run is necessary.
#

Example usage and output of the tool for job Hadoop-Common-0.23-Build,
which indicates that the same test failed five times in a row:

./determine-flaky-tests-hadoop.py -j Hadoop-Common-0.23-Build
****Recently FAILED builds in url:
https://builds.apache.org//job/Hadoop-Common-0.23-Build
    THERE ARE 5 builds (out of 5) that have failed tests in the past 14
days, as listed below:

==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1057/testReport
(2014-08-30 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1056/testReport
(2014-08-29 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1055/testReport
(2014-08-28 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1054/testReport
(2014-08-27 02:01:29)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec
==>https://builds.apache.org/job/Hadoop-Common-0.23-Build/1053/testReport
(2014-08-26 02:01:30)
    Failed test: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec

All failed tests <#occurrences: testName>:
    5: org.apache.hadoop.io.compress.TestCodec.testSnappyCodec


Another example (for job Hadoop-Hdfs-trunk):

[yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -n 7
****Recently FAILED builds in url:
https://builds.apache.org//job/Hadoop-Hdfs-trunk
    THERE ARE 7 builds (out of 8) that have failed tests in the past 7
days, as listed below:

==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1856/testReport
(2014-08-30 09:46:54)
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
    Failed test:
org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1855/testReport
(2014-08-30 04:31:30)
    Failed test:
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer
    Failed test:
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution
==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1854/testReport
(2014-08-29 04:31:30)
   Could not open testReport
==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1853/testReport
(2014-08-28 09:37:18)
   Could not open testReport
==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1852/testReport
(2014-08-28 09:28:48)
   Could not open testReport
==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1850/testReport
(2014-08-27 04:31:30)
    Failed test:
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End
==>https://builds.apache.org/job/Hadoop-Hdfs-trunk/1849/testReport
(2014-08-26 04:31:29)
    Failed test:
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity

All failed tests <#occurrences: testName>:
    1:
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer.testBalancer0Integrity
    1:
org.apache.hadoop.hdfs.TestDFSClientRetries.testIdempotentAllocateBlockAndClose
    1:
org.apache.hadoop.hdfs.TestDFSClientRetries.testFailuresArePerOperation
    1:
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS.testEnd2End
    1:
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testUnevenDistribution
    1:
org.apache.hadoop.hdfs.TestDFSClientRetries.testRetryOnChecksumFailure
    1:
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancer
    1:
org.apache.hadoop.hdfs.TestDFSClientRetries.testWriteTimeoutAtDataNode
    1:
org.apache.hadoop.hdfs.TestDFSClientRetries.testDFSClientRetriesOnBusyBlocks
    1:
org.apache.hadoop.hdfs.TestDFSClientRetries.testClientDNProtocolTimeout
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testGetFileChecksum
    1: org.apache.hadoop.hdfs.TestDFSClientRetries.testNamenodeRestart
[yzhang@localhost jenkinsftf]$



On Thu, Aug 28, 2014 at 8:04 PM, Yongjun Zhang <yzh...@cloudera.com> wrote:

> Hi,
>
> I just noticed that the recent jenkin test report doesn't include link to
> test result, however, the email notice does show the failed tests:
>
> E.g.
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/7846//
>
> Example old job report that has the link:
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/7590/
>
> Would any one please take a look?
>
> Thanks a lot.
>
> --Yongjun
>
> On Thu, Aug 28, 2014 at 4:21 PM, Karthik Kambatla <ka...@cloudera.com>
> wrote:
>
>> Thanks Giri and Ted for fixing the builds.
>>
>>
>> On Thu, Aug 28, 2014 at 9:49 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > Charles:
>> > QA build is running for your JIRA:
>> > https://builds.apache.org/job/PreCommit-hdfs-Build/7828/parameters/
>> >
>> > Cheers
>> >
>> >
>> > On Thu, Aug 28, 2014 at 9:41 AM, Charles Lamb <cl...@cloudera.com>
>> wrote:
>> >
>> > > On 8/28/2014 12:07 PM, Giridharan Kesavan wrote:
>> > >
>> > >> Fixed all the 3 pre-commit buids. test-patch's git reset --hard is
>> > >> removing
>> > >> the patchprocess dir, so moved it off the workspace.
>> > >>
>> > >>
>> > > Thanks Giri. Should I resubmit HDFS-6954's patch? I've gotten 3 or 4
>> > > jenkins messages that indicated the problem so something is
>> resubmitting,
>> > > but now that you've fixed it, should I resubmit it again?
>> > >
>> > > Charles
>> > >
>> > >
>> >
>>
>
>

Reply via email to