[ 
https://issues.apache.org/jira/browse/KUDU-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273819#comment-17273819
 ] 

ASF subversion and git services commented on KUDU-3237:
-------------------------------------------------------

Commit cbb8db7711785e85e7c74fe86b93d7c1ba9c9c24 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=cbb8db7 ]

KUDU-3237 fix MaintenanceManagerTest.TestCompletedOpsHistory

This patch fixes a flakiness in the TestCompletedOpsHistory scenario.
The flakiness is a test-only issue which became apparent as a result of
the recent changes introduced into the MaintenanceManager with 9e4664d44
changelist.  In essence, with finer granularity of locking in the
scoped cleanup of the MaintenanceManager::LaunchOp() method, the test
thread calling MaintenanceManager::GetMaintenanceManagerStatusDump()
has a slight chance of acquiring 'completed_ops_lock_' ahead of the
thread executing the code in the LaunchOp()'s scoped cleanup.

This patch wraps the related code into ASSERT_EVENTUALLY to resolve
test-only race condition mentioned above.  I verified that this patch
fixes the issue by running the test scenario multiple times under
dist-test (RELEASE build).

Before: 2 out of 256 runs failed
  http://dist-test.cloudera.org//job?job_id=aserbin.1611806979.74192

After : 0 out of 256 runs failed
  http://dist-test.cloudera.org//job?job_id=aserbin.1611809676.95320

Change-Id: I760287b3ed4d50e32d2f9257e5390fdf8fa8f288
Reviewed-on: http://gerrit.cloudera.org:8080/16991
Tested-by: Alexey Serbin <aser...@cloudera.com>
Reviewed-by: Alexey Serbin <aser...@cloudera.com>


> MaintenanceManagerTest.TestCompletedOpsHistory is flaky
> -------------------------------------------------------
>
>                 Key: KUDU-3237
>                 URL: https://issues.apache.org/jira/browse/KUDU-3237
>             Project: Kudu
>          Issue Type: Test
>            Reporter: Hao Hao
>            Priority: Major
>
> Came across test failure in MaintenanceManagerTest.TestCompletedOpsHistory as 
> the following:
> {noformat}
> I0125 19:55:10.782884 24454 maintenance_manager.cc:594] P 12345: op5 
> complete. Timing: real 0.000s    user 0.000s     sys 0.000s Metrics: {}
> /data/1/hao/Repositories/kudu/src/kudu/util/maintenance_manager-test.cc:525: 
> Failure
>       Expected: std::min(kHistorySize, i + 1)
>       Which is: 6
> To be equal to: status_pb.completed_operations_size()
>       Which is: 5
> I0125 19:55:10.783524 24420 test_util.cc:148] 
> -----------------------------------------------
> I0125 19:55:10.783561 24420 test_util.cc:149] Had fatal failures, leaving 
> test files at 
> /tmp/dist-test-task1ofSWE/test-tmp/maintenance_manager-test.0.MaintenanceManagerTest.TestCompletedOpsHistory.1611604508702756-24420
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to