[ https://issues.apache.org/jira/browse/KUDU-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890271#comment-17890271 ]
Alexey Serbin commented on KUDU-3620: ------------------------------------- As for the reproduction, I was able to hit the ASAN/UBSAN issue 11 times out of about 1650 iterations of the test, where I was running the bits built for ASAN configuration on a Ubuntu 20 machine with 8 CPU cores like below: {noformat} ./bin/ts_recovery-itest --gtest_filter='DifferentFaultPoints/Kudu969Test.Test/*' --gtest_repeat=10000 > /tmp/ts_recovery-itest.log 2>&1 {noformat} Out of 11 total reported issues, 5 were reported by ASAN, and 6 were reported by UBSAN. > Race condition in OpDriver::ReplicationFinished() > ------------------------------------------------- > > Key: KUDU-3620 > URL: https://issues.apache.org/jira/browse/KUDU-3620 > Project: Kudu > Issue Type: Bug > Components: master, tserver > Reporter: Alexey Serbin > Priority: Major > Attachments: ts_recovery-itest.asan.txt.xz, > ts_recovery-itest.sigsegv.txt.xz, ts_recovery-itest.ubsan.log.xz > > > There is a race condition in {{OpDriver::ReplicationFinished}} that with > [1b99da532|https://github.com/apache/kudu/commit/1b99da532f52d143c46440c3903785d642fb45a3] > manifests itself in the following ways when running ts_recovery-itest: > # A tablet server crashes with SIGSEGV (DEBUG builds and probably RELEASE > builds as well) > # The address sanitizer issues warnings (ASAN builds) > ## The AddressSanitizer reports a heap-use-after-free error > ## The UndefinedBehaviorSanitizer reports a run-time error due to invalid vptr > Full logs are attached. > The stack trace for item 1: > {noformat} > *** Aborted at 1727269462 (unix time) try "date -d @1727269462" if you are > using GNU date *** > PC: @ 0x0 (unknown) > *** SIGSEGV (@0x30) received by PID 14694 (TID 0x7f734f91b700) from PID 48; > stack trace: *** > @ 0x7f73830a5980 (unknown) at ??:0 > @ 0x7f73848b3db6 kudu::tablet::OpState::tablet_replica() at ??:0 > @ 0x7f73848d55c3 kudu::tablet::OpDriver::ReplicationFinished() at ??:0 > @ 0x7f73848aa27e > _ZZN4kudu6tablet13TabletReplica15StartFollowerOpERK13scoped_refptrINS_9consensus14ConsensusRoundEEENKUlRKNS_6StatusEE_clESA_ > at ??:0 > @ 0x7f73848b0f41 > _ZNSt17_Function_handlerIFvRKN4kudu6StatusEEZNS0_6tablet13TabletReplica15StartFollowerOpERK13scoped_refptrINS0_9consensus14ConsensusRoundEEEUlS3_E_E9_M_invokeERKSt9_Any_dataS3_ > at ??:0 > @ 0x7f7386351325 std::function<>::operator()() at ??:0 > @ 0x7f7384407f2b > kudu::consensus::ConsensusRound::NotifyReplicationFinished() at ??:0 > @ 0x7f73843d774b > kudu::consensus::PendingRounds::AdvanceCommittedIndex() at ??:0 > @ 0x7f73843f6888 kudu::consensus::RaftConsensus::UpdateReplica() at > ??:0 > @ 0x7f73843f1ef5 kudu::consensus::RaftConsensus::Update() at ??:0 > @ 0x7f7385467de7 > kudu::tserver::ConsensusServiceImpl::UpdateConsensus() at ??:0 > @ 0x7f7383c95fd2 > _ZZN4kudu9consensus18ConsensusServiceIfC4ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE0_clESG_SH_SJ_ > at ??:0 > @ 0x7f7383c9a063 > _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_9consensus18ConsensusServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E0_E9_M_invokeERKSt9_Any_dataOS4_OS5_OS9_ > at ??:0 > @ 0x7f73834af4b8 std::function<>::operator()() at ??:0 > @ 0x7f73834aed6c kudu::rpc::GeneratedServiceIf::Handle() at ??:0 > @ 0x7f73834b1a7d kudu::rpc::ServicePool::RunThread() at ??:0 > @ 0x7f73834b03c7 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv at ??:0 > @ 0x7f73834b1e06 > _ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data > at ??:0 > @ 0x55ab245f526e std::function<>::operator()() at ??:0 > @ 0x7f7382853bb1 kudu::Thread::SuperviseThread() at ??:0 > @ 0x7f738309a6db start_thread at ??:0 > @ 0x7f73805ae71f clone at ??:0 > {noformat} > A sample of output for item 2.1: > {noformat} > ==26864==ERROR: AddressSanitizer: heap-use-after-free on address > 0x617000212830 at pc 0x7fd36dc2c636 bp 0x7fd32f986530 sp 0x7fd32f986528 > READ of size 8 at 0x617000212830 thread T84 (rpc worker-2694) > #0 0x7fd36dc2c635 in kudu::tablet::OpState::tablet_replica() const > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op.h:189:12 > #1 0x7fd36dc70732 in > kudu::tablet::OpDriver::ReplicationFinished(kudu::Status const&) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.cc:443:37 > #2 0x7fd36dc20493 in > kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound> > const&)::$_7::operator()(kudu::Status const&) const > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/tablet_replica.cc:857:51 > #3 0x7fd36dc202fc in std::_Function_handler<void (kudu::Status const&), > kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound> > const&)::$_7>::_M_invoke(std::_Any_data const&, kudu::Status const&) > ../../../include/c++/7.5.0/bits/std_function.h:316:2 > #4 0x7fd37460bd0d in std::function<void (kudu::Status > const&)>::operator()(kudu::Status const&) const > ../../../include/c++/7.5.0/bits/std_function.h:706:14 > #5 0x7fd36c940afc in > kudu::consensus::ConsensusRound::NotifyReplicationFinished(kudu::Status > const&) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/raft_consensus.cc:3311:3 > #6 0x7fd36c8cdbbc in > kudu::consensus::PendingRounds::AdvanceCommittedIndex(long) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/pending_rounds.cc:185:12 > #7 0x7fd36c916f16 in > kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/raft_consensus.cc:1530:5 > #8 0x7fd36c914e57 in > kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/raft_consensus.cc:1097:14 > #9 0x7fd3705ec7ad in > kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tserver/tablet_service.cc:1764:25 > #10 0x7fd36ace9b56 in > kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> > const&, scoped_refptr<kudu::rpc::ResultTracker> > const&)::$_1::operator()(google::protobuf::Message const*, > google::protobuf::Message*, kudu::rpc::RpcContext*) const > /home/jenkins-slave/workspace/build_and_test_flaky@2/build/asan/src/kudu/consensus/consensus.service.cc:299:13 > #11 0x7fd36ace9885 in std::_Function_handler<void > (google::protobuf::Message const*, google::protobuf::Message*, > kudu::rpc::RpcContext*), > kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> > const&, scoped_refptr<kudu::rpc::ResultTracker> > const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message > const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) > ../../../include/c++/7.5.0/bits/std_function.h:316:2 > #12 0x7fd367dc924e in std::function<void (google::protobuf::Message > const*, google::protobuf::Message*, > kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, > google::protobuf::Message*, kudu::rpc::RpcContext*) const > ../../../include/c++/7.5.0/bits/std_function.h:706:14 > #13 0x7fd367dc812e in > kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/rpc/service_if.cc:137:3 > #14 0x7fd367dce365 in kudu::rpc::ServicePool::RunThread() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/rpc/service_pool.cc:229:15 > #15 0x7fd367dcec8f in > kudu::rpc::ServicePool::Init(int)::$_0::operator()() const > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/rpc/service_pool.cc:92:5 > #16 0x7fd367dceab8 in std::_Function_handler<void (), > kudu::rpc::ServicePool::Init(int)::$_0>::_M_invoke(std::_Any_data const&) > ../../../include/c++/7.5.0/bits/std_function.h:316:2 > #17 0xa86d2c in std::function<void ()>::operator()() const > ../../../include/c++/7.5.0/bits/std_function.h:706:14 > #18 0x7fd36108db5d in kudu::Thread::SuperviseThread(void*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/thread.cc:693:3 > #19 0x7fd36446b6da in start_thread > (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) > #20 0x7fd35d1fa71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) > 0x617000212830 is located 48 bytes inside of 688-byte region > [0x617000212800,0x617000212ab0) > freed by thread T140 (apply [worker]-) here: > #0 0x9557b0 in operator delete(void*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/l > ib/asan/asan_new_delete.cpp:160 > #1 0x7fd36dca4f0a in kudu::tablet::WriteOpState::~WriteOpState() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/write_ > op.cc:665:31 > #2 0x7fd37472bf41 in > std::default_delete<kudu::tablet::WriteOpState>::operator()(kudu::tablet::WriteOpState*) > const ../../../include/c++/7.5.0/bits/unique_ptr.h:78:2 > #3 0x7fd37471974b in std::unique_ptr<kudu::tablet::WriteOpState, > std::default_delete<kudu::tablet::WriteOpState> >::~unique_ptr() > ../../../include/c++/7.5.0/bits/unique_ptr.h:263:4 > #4 0x7fd36dca9c64 in kudu::tablet::WriteOp::~WriteOp() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/write_op.h:345:7 > #5 0x7fd36dca9ca2 in kudu::tablet::WriteOp::~WriteOp() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/write_op.h:345:7 > #6 0x7fd36dc348d1 in > std::default_delete<kudu::tablet::Op>::operator()(kudu::tablet::Op*) const > ../../../include/c++/7.5.0/bits/unique_ptr.h:78:2 > #7 0x7fd36dc2700b in std::unique_ptr<kudu::tablet::Op, > std::default_delete<kudu::tablet::Op> >::~unique_ptr() > ../../../include/c++/7.5.0/bits/unique_ptr.h:263:4 > #8 0x7fd36dc44252 in kudu::tablet::OpDriver::~OpDriver() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.h:304:16 > #9 0x7fd36dc4421a in kudu::RefCountedThreadSafe<kudu::tablet::OpDriver, > kudu::DefaultRefCountedThreadSafeTraits<kudu::tablet::OpDriver> > >::DeleteInternal(kudu::tablet::OpDriver const*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:153:44 > #10 0x7fd36dc441f0 in > kudu::DefaultRefCountedThreadSafeTraits<kudu::tablet::OpDriver>::Destruct(kudu::tablet::OpDriver > const*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:116:5 > #11 0x7fd36dc441be in kudu::RefCountedThreadSafe<kudu::tablet::OpDriver, > kudu::DefaultRefCountedThreadSafeTraits<kudu::tablet::OpDriver> >::Release() > const > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:144:7 > #12 0x7fd36dc270e7 in > scoped_refptr<kudu::tablet::OpDriver>::~scoped_refptr() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:266:13 > #13 0x7fd36dc71f53 in kudu::tablet::OpDriver::ApplyTask() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.cc:563:1 > #14 0x7fd36dc74ccb in > kudu::tablet::OpDriver::ApplyAsync()::$_2::operator()() const > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.cc:504:47 > #15 0x7fd36dc74b48 in std::_Function_handler<void (), > kudu::tablet::OpDriver::ApplyAsync()::$_2>::_M_invoke(std::_Any_data const&) > ../../../include/c++/7.5.0/bits/std_function.h:316:2 > #16 0xa86d2c in std::function<void ()>::operator()() const > ../../../include/c++/7.5.0/bits/std_function.h:706:14 > #17 0x7fd3610af604 in kudu::ThreadPool::DispatchThread() > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/threadpool.cc:776:7 > #18 0x7fd3610b2c2b in kudu::ThreadPool::CreateThread()::$_2::operator()() > const > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/threadpool.cc:849:48 > #19 0x7fd3610b2aa8 in std::_Function_handler<void (), > kudu::ThreadPool::CreateThread()::$_2>::_M_invoke(std::_Any_data const&) > ../../../include/c++/7.5.0/bits/std_function.h:316:2 > #20 0xa86d2c in std::function<void ()>::operator()() const > ../../../include/c++/7.5.0/bits/std_function.h:706:14 > #21 0x7fd36108db5d in kudu::Thread::SuperviseThread(void*) > /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/thread.cc:693:3 > #22 0x7fd36446b6da in start_thread > (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) > {noformat} > A sample output for item 2.2: > {noformat} > /root/Projects/kudu/src/kudu/tablet/ops/op.h:189:12: runtime error: member > access within address 0x617000118e80 which does not point to an object of > type 'const kudu::tablet::OpState' > 0x617000118e80: note: object has invalid vptr > 7f 00 80 6c 78 00 00 7e 44 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 > ^~~~~~~~~~~~~~~~~~~~~~~ > invalid vptr > #0 0x7f44c3e91f3d in kudu::tablet::OpState::tablet_replica() const > /root/Projects/kudu/src/kudu/tablet/ops/op.h:189:12 > #1 0x7f44c3ed5762 in > kudu::tablet::OpDriver::ReplicationFinished(kudu::Status const&) > /root/Projects/kudu/src/kudu/tablet/ops/op_driver.cc:443:37 > #2 0x7f44c3e85ca3 in > kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound> > const&)::$_7::operator()(kudu::Status const&) const > /root/Projects/kudu/src/kudu/tablet/tablet_replica.cc:857:51 > #3 0x7f44c3e85b0c in std::_Function_handler<void (kudu::Status const&), > kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound> > const&)::$_7>::_M_invoke(std::_Any_data const&, kudu::Status const&) > ../../../include/c++/9/bits/std_function.h:300:2 > #4 0x7f44ca5fa80d in std::function<void (kudu::Status > const&)>::operator()(kudu::Status const&) const > ../../../include/c++/9/bits/std_function.h:688:14 > #5 0x7f44c2bd051c in > kudu::consensus::ConsensusRound::NotifyReplicationFinished(kudu::Status > const&) /root/Projects/kudu/src/kudu/consensus/raft_consensus.cc:3311:3 > #6 0x7f44c2b5cf49 in > kudu::consensus::PendingRounds::AdvanceCommittedIndex(long) > /root/Projects/kudu/src/kudu/consensus/pending_rounds.cc:187:12 > #7 0x7f44c2ba6498 in > kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*) > /root/Projects/kudu/src/kudu/consensus/raft_consensus.cc:1530:5 > #8 0x7f44c2ba43a7 in > kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*) > /root/Projects/kudu/src/kudu/consensus/raft_consensus.cc:1097:14 > #9 0x7f44c675861d in > kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*) > /root/Projects/kudu/src/kudu/tserver/tablet_service.cc:1764:25 > #10 0x7f44c0fe2086 in > kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> > const&, scoped_refptr<kudu::rpc::ResultTracker> > const&)::$_1::operator()(google::protobuf::Message const*, > google::protobuf::Message*, kudu::rpc::RpcContext*) const > /root/Projects/kudu/build/master.asan/src/kudu/consensus/consensus.service.cc:299:13 > #11 0x7f44c0fe1db5 in std::_Function_handler<void > (google::protobuf::Message const*, google::protobuf::Message*, > kudu::rpc::RpcContext*), > kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity> > const&, scoped_refptr<kudu::rpc::ResultTracker> > const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message > const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&) > ../../../include/c++/9/bits/std_function.h:300:2 > #12 0x7f44be4e5c6e in std::function<void (google::protobuf::Message > const*, google::protobuf::Message*, > kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, > google::protobuf::Message*, kudu::rpc::RpcContext*) const > ../../../include/c++/9/bits/std_function.h:688:14 > #13 0x7f44be4e4b4e in > kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) > /root/Projects/kudu/src/kudu/rpc/service_if.cc:137:3 > #14 0x7f44be4eac35 in kudu::rpc::ServicePool::RunThread() > /root/Projects/kudu/src/kudu/rpc/service_pool.cc:229:15 > #15 0x7f44be4eb55f in > kudu::rpc::ServicePool::Init(int)::$_0::operator()() const > /root/Projects/kudu/src/kudu/rpc/service_pool.cc:92:5 > #16 0x7f44be4eb388 in std::_Function_handler<void (), > kudu::rpc::ServicePool::Init(int)::$_0>::_M_invoke(std::_Any_data const&) > ../../../include/c++/9/bits/std_function.h:300:2 > #17 0xa097bc in std::function<void ()>::operator()() const > ../../../include/c++/9/bits/std_function.h:688:14 > #18 0x7f44b8530a9d in kudu::Thread::SuperviseThread(void*) > /root/Projects/kudu/src/kudu/util/thread.cc:693:3 > #19 0x7f44baff3608 in start_thread > /build/glibc-LcI20x/glibc-2.31/nptl/pthread_create.c:477:8 > #20 0x7f44b665b352 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f352) > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior > /root/Projects/kudu/src/kudu/tablet/ops/op.h:189:12 in > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)