[ https://issues.apache.org/jira/browse/KUDU-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935620#comment-17935620 ]
ASF subversion and git services commented on KUDU-3651: ------------------------------------------------------- Commit e9b3dae78edd84b4630e77601f285476ecae2f38 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=e9b3dae78 ] KUDU-3651 fix race condition in TabletReplica::Stop() This patch addresses a race condition in TabletReplica::Stop(). Before this patch, new operations might be accepted by a tablet replica right after calling OpTracker::WaitForAllToFinish() and before completing the shutdown of the replica's prepare pool token. The race has been manifesting itself at least as a flakiness in various test scenarios in txn_participant-test [1]. In one particular instance, the following TSAN warnings were issued while running the TxnParticipantTest.TestBeginCommitAnchorsOnFlush scenario: WARNING: ThreadSanitizer: data race (pid=4116) Write of size 8 at 0x7b4400027688 by main thread: #0 std::__1::__vector_base<kudu::MemTracker*, std::__1::allocator<kudu::MemTracker*> >::__destruct_at_end(kudu::MemTracker**) ... #3 std::__1::vector<kudu::MemTracker*, std::__1::allocator<kudu::MemTracker*> >::~vector() #4 kudu::MemTracker::~MemTracker() mem_tracker.cc:83:1 ... #9 kudu::tablet::OpTracker::~OpTracker() #10 kudu::tablet::TabletReplica::~TabletReplica() ... #16 scoped_refptr<kudu::tablet::TabletReplica>::reset(kudu::tablet::TabletReplica*) #17 kudu::tablet::TabletReplicaTestBase::RestartReplica(bool) Previous read of size 8 at 0x7b4400027688 by thread T20 (mutexes: write M1047222376632167904): #0 std::__1::vector<kudu::MemTracker*, std::__1::allocator<kudu::MemTracker*> >::end() #1 kudu::MemTracker::Release(long) #2 kudu::tablet::OpTracker::Release(kudu::tablet::OpDriver*) #3 kudu::tablet::OpDriver::Finalize() #4 kudu::tablet::OpDriver::ApplyTask() #5 kudu::tablet::OpDriver::ApplyAsync()::$_2::operator()() ... [1] http://dist-test.cloudera.org:8080/test_drilldown?test_name=txn_participant-test Change-Id: I993015bf73ad8fe84a864b8b3c030e1be00e26e0 Reviewed-on: http://gerrit.cloudera.org:8080/22612 Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com> Reviewed-by: Marton Greber <greber...@gmail.com> Tested-by: Marton Greber <greber...@gmail.com> > Race condition in TabletReplica::Stop() > --------------------------------------- > > Key: KUDU-3651 > URL: https://issues.apache.org/jira/browse/KUDU-3651 > Project: Kudu > Issue Type: Bug > Components: master, tablet, tserver > Affects Versions: 0.7.0, 0.7.1, 0.8.0, 0.9.0, 0.9.1, 0.10.0, 1.0.0, 1.0.1, > 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, > 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0, 1.15.0, 1.16.0, > 1.17.0, 1.17.1 > Reporter: Alexey Serbin > Priority: Major > Attachments: txn_participant-test.txt.xz > > > There is a race condition in {{TabletReplica::Stop()}}: new operations might > be accepted by a tablet replica right after calling > {{OpTracker::WaitForAllToFinish()}} and before the replica's prepare pool > token is shut down. > The race manifests itself at least as a TSAN warning in various scenarios of > {{txn_participant-test}}. See the attached log for a report on one > particular instance of the race. One excerpt is below: > {noformat} > WARNING: ThreadSanitizer: data race (pid=4116) > Write of size 8 at 0x7b4400027688 by main thread: > #0 std::__1::__vector_base<kudu::MemTracker*, > std::__1::allocator<kudu::MemTracker*> > >::__destruct_at_end(kudu::MemTracker**) > ... > #3 std::__1::vector<kudu::MemTracker*, > std::__1::allocator<kudu::MemTracker*> >::~vector() > #4 kudu::MemTracker::~MemTracker() mem_tracker.cc:83:1 > ... > #9 kudu::tablet::OpTracker::~OpTracker() > #10 kudu::tablet::TabletReplica::~TabletReplica() > ... > #16 > scoped_refptr<kudu::tablet::TabletReplica>::reset(kudu::tablet::TabletReplica*) > #17 kudu::tablet::TabletReplicaTestBase::RestartReplica(bool) > > Previous read of size 8 at 0x7b4400027688 by thread T20 (mutexes: write > M1047222376632167904): > #0 std::__1::vector<kudu::MemTracker*, > std::__1::allocator<kudu::MemTracker*> >::end() > #1 kudu::MemTracker::Release(long) > #2 kudu::tablet::OpTracker::Release(kudu::tablet::OpDriver*) > #3 kudu::tablet::OpDriver::Finalize() > #4 kudu::tablet::OpDriver::ApplyTask() > #5 kudu::tablet::OpDriver::ApplyAsync()::$_2::operator()() > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)