[ https://issues.apache.org/jira/browse/KUDU-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273819#comment-17273819 ]
ASF subversion and git services commented on KUDU-3237: ------------------------------------------------------- Commit cbb8db7711785e85e7c74fe86b93d7c1ba9c9c24 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=cbb8db7 ] KUDU-3237 fix MaintenanceManagerTest.TestCompletedOpsHistory This patch fixes a flakiness in the TestCompletedOpsHistory scenario. The flakiness is a test-only issue which became apparent as a result of the recent changes introduced into the MaintenanceManager with 9e4664d44 changelist. In essence, with finer granularity of locking in the scoped cleanup of the MaintenanceManager::LaunchOp() method, the test thread calling MaintenanceManager::GetMaintenanceManagerStatusDump() has a slight chance of acquiring 'completed_ops_lock_' ahead of the thread executing the code in the LaunchOp()'s scoped cleanup. This patch wraps the related code into ASSERT_EVENTUALLY to resolve test-only race condition mentioned above. I verified that this patch fixes the issue by running the test scenario multiple times under dist-test (RELEASE build). Before: 2 out of 256 runs failed http://dist-test.cloudera.org//job?job_id=aserbin.1611806979.74192 After : 0 out of 256 runs failed http://dist-test.cloudera.org//job?job_id=aserbin.1611809676.95320 Change-Id: I760287b3ed4d50e32d2f9257e5390fdf8fa8f288 Reviewed-on: http://gerrit.cloudera.org:8080/16991 Tested-by: Alexey Serbin <aser...@cloudera.com> Reviewed-by: Alexey Serbin <aser...@cloudera.com> > MaintenanceManagerTest.TestCompletedOpsHistory is flaky > ------------------------------------------------------- > > Key: KUDU-3237 > URL: https://issues.apache.org/jira/browse/KUDU-3237 > Project: Kudu > Issue Type: Test > Reporter: Hao Hao > Priority: Major > > Came across test failure in MaintenanceManagerTest.TestCompletedOpsHistory as > the following: > {noformat} > I0125 19:55:10.782884 24454 maintenance_manager.cc:594] P 12345: op5 > complete. Timing: real 0.000s user 0.000s sys 0.000s Metrics: {} > /data/1/hao/Repositories/kudu/src/kudu/util/maintenance_manager-test.cc:525: > Failure > Expected: std::min(kHistorySize, i + 1) > Which is: 6 > To be equal to: status_pb.completed_operations_size() > Which is: 5 > I0125 19:55:10.783524 24420 test_util.cc:148] > ----------------------------------------------- > I0125 19:55:10.783561 24420 test_util.cc:149] Had fatal failures, leaving > test files at > /tmp/dist-test-task1ofSWE/test-tmp/maintenance_manager-test.0.MaintenanceManagerTest.TestCompletedOpsHistory.1611604508702756-24420 > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)