[ 
https://issues.apache.org/jira/browse/KUDU-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012655#comment-18012655
 ] 

ASF subversion and git services commented on KUDU-2439:
-------------------------------------------------------

Commit d1413014255df62a6e38550552a8596644aa0033 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d14130142 ]

KUDU-2439 address the issue up to some extent

This changelist adds a work-around for the issue described in KUDU-2439.
It's based on the recently added provisions for calling OPENSSL_cleanup
explicitly with the library of versions 1.1.1 and newer.
Yes, it's a stop-gap, but it's better than nothing.

As for testing, I verified that the issue is gone after seeing its
manifestation with the frequency of about 1 in 30 runs of the kudu CLI
tool with RELEASE bits on RHEL8.8 x86_64 and RHEL9.2 x86_64 without
the patch.  I ran the following for reproduction, changing 100rep
to 1000rep for verificaion runs:

  ./bin/kudu-tool-test --gtest_repeat=100

Without the patch, I saw core files left by the kudu CLI binary during
every 100rep run, where many of the core files would have stack traces
attributable to the JIRA item.  With this patch, not a single crash
has been observed and no core files have been generated after many
100rep and a few 1000rep test runs with Kudu RELEASE bits.

NOTE: the issue no longer manifests itself in RELEASE builds,
      but there is still something that makes kudu CLI sometimes crash
      on exit in DEBUG builds, but the stack trace doesn't involve any
      OpenSSL symbols: instead, it has mostly libtcmalloc symbols
      and also involves __do_global_dtors_aux() from librocksdb

Change-Id: I472bb6b5a4edf2a1a03c0e3cdcd64e743b7e1e1f
Reviewed-on: http://gerrit.cloudera.org:8080/23262
Tested-by: Alexey Serbin <ale...@apache.org>
Reviewed-by: Abhishek Chennaka <achenn...@cloudera.com>


> There's no way to safely clean up a KuduClient or Messenger
> -----------------------------------------------------------
>
>                 Key: KUDU-2439
>                 URL: https://issues.apache.org/jira/browse/KUDU-2439
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, rpc
>    Affects Versions: 1.8.0
>            Reporter: Adar Dembo
>            Priority: Major
>         Attachments: kudu-admin-test.7.txt.xz
>
>
> KuduClient has shared ownership, and its only "shutdown" knob is to drop its 
> last ref. This drops the last ref on the underlying Messenger object, but 
> Messenger itself has a funky "internal" vs. "external" ref system, and 
> destroying the KuduClient only drops the last external ref. The Messenger is 
> only destroyed when the last internal ref is dropped, and that only happens 
> when outstanding reactor threads finish whatever processing they were busy 
> doing. So, there's no way for a user to "destroy this KuduClient and wait for 
> all outstanding resources to be cleaned up".
> Why is this important? For one, there's a known data race with outstanding 
> work done by the KuduClient's DnsResolver. For two, there's a similar issue 
> with OpenSSL. OpenSSL 1.1 registers an atexit() handler that tears down its 
> global library state. For this to be safe, all allocated OpenSSL resources 
> must have been released by the time atexit() handlers run (which is to say, 
> all OpenSSL resources must be released by the time main() returns or exit() 
> is called). Because we can't wait on the KuduClient to destroy itself and its 
> OpenSSL resources, applications using the KuduClient may run the atexit() 
> handler at an unsafe time.
> Here's the TSAN output for the OpenSSL data race:. It's trivial to reproduce 
> via reactor-test once the appropriate suppression is removed from 
> tsan-suppressions.txt:
> {noformat}
> WARNING: ThreadSanitizer: data race (pid=7914)
> Write of size 1 at 0x7b1000004340 by main thread:
> #0 pthread_rwlock_destroy 
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1313
>  (reactor-test+0x48205e)
> #1 CRYPTO_THREAD_lock_free 
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:95 
> (libcrypto.so.1.1+0x20d0e5)
> Previous atomic read of size 1 at 0x7b1000004340 by thread T16:
> #0 pthread_rwlock_wrlock 
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1352
>  (reactor-test+0x45ffe3)
> #1 CRYPTO_THREAD_write_lock 
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:66 
> (libcrypto.so.1.1+0x20d08a)
> #2 std::__1::__function::__func<void (*)(ssl_ctx_st*), 
> std::__1::allocator<void (*)(ssl_ctx_st*)>, void 
> (ssl_ctx_st*)>::operator()(ssl_ctx_st*&&) 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1562:12
>  (libsecurity.so+0x56b14)
> #3 std::__1::function<void (ssl_ctx_st*)>::operator()(ssl_ctx_st*) const 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1916:12
>  (libkrpc.so+0xb139d)
> #4 std::__1::unique_ptr<ssl_ctx_st, std::__1::function<void (ssl_ctx_st*)> 
> >::reset(ssl_ctx_st*) 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2598:7 
> (libkrpc.so+0xb0f9e)
> #5 std::__1::unique_ptr<ssl_ctx_st, std::__1::function<void (ssl_ctx_st*)> 
> >::~unique_ptr() 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2552 
> (libkrpc.so+0xb0f9e)
> #6 kudu::security::TlsContext::~TlsContext() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/security/tls_context.h:77 
> (libkrpc.so+0xb0f9e)
> #7 
> std::__1::default_delete<kudu::security::TlsContext>::operator()(kudu::security::TlsContext*)
>  const 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2285:5 
> (libkrpc.so+0xab32c)
> #8 std::__1::unique_ptr<kudu::security::TlsContext, 
> std::__1::default_delete<kudu::security::TlsContext> 
> >::reset(kudu::security::TlsContext*) 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2598 
> (libkrpc.so+0xab32c)
> #9 std::__1::unique_ptr<kudu::security::TlsContext, 
> std::__1::default_delete<kudu::security::TlsContext> >::~unique_ptr() 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2552 
> (libkrpc.so+0xab32c)
> #10 kudu::rpc::Messenger::~Messenger() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:433 
> (libkrpc.so+0xab32c)
> #11 
> std::__1::default_delete<kudu::rpc::Messenger>::operator()(kudu::rpc::Messenger*)
>  const 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2285:5 
> (libkrpc.so+0xb0ddb)
> #12 std::__1::__shared_ptr_pointer<kudu::rpc::Messenger*, 
> std::__1::default_delete<kudu::rpc::Messenger>, 
> std::__1::allocator<kudu::rpc::Messenger> >::__on_zero_shared() 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3586 
> (libkrpc.so+0xb0ddb)
> #13 std::__1::__shared_count::__release_shared() 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3490:9 
> (reactor-test+0x4d0a2e)
> #14 std::__1::__shared_weak_count::__release_shared() 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3532 
> (reactor-test+0x4d0a2e)
> #15 std::__1::shared_ptr<kudu::rpc::Messenger>::~shared_ptr() 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4468 
> (reactor-test+0x4d0a2e)
> #16 std::__1::shared_ptr<kudu::rpc::Messenger>::reset() 
> /home/adar/Source/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4603:5 
> (libkrpc.so+0xbefd1)
> #17 kudu::rpc::ReactorThread::RunThread() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor.cc:482 
> (libkrpc.so+0xbefd1)
> #18 boost::_mfi::mf0<void, 
> kudu::rpc::ReactorThread>::operator()(kudu::rpc::ReactorThread*) const 
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29
>  (libkrpc.so+0xc8c69)
> #19 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> 
> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, 
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, 
> kudu::rpc::ReactorThread>&, boost::_bi::list0&, int) 
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9
>  (libkrpc.so+0xc8bba)
> #20 boost::_bi::bind_t<void, boost::_mfi::mf0<void, 
> kudu::rpc::ReactorThread>, 
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> > 
> >::operator()() 
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16
>  (libkrpc.so+0xc8b43)
> #21 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, 
> boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> > >, 
> void>::invoke(boost::detail::function::function_buffer&) 
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11
>  (libkrpc.so+0xc8939)
> #22 boost::function0<void>::operator()() const 
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14
>  (libkrpc.so+0xb8a21)
> #23 kudu::Thread::SuperviseThread(void*) 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/thread.cc:603:3 
> (libkudu_util.so+0x1c1264)
> Location is heap block of size 56 at 0x7b1000004340 allocated by main thread:
> #0 malloc 
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:666
>  (reactor-test+0x47cc9c)
> #1 CRYPTO_malloc 
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/mem.c:92 
> (libcrypto.so.1.1+0x1a6327)
> #2 CRYPTO_THREAD_run_once 
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:106
>  (libcrypto.so.1.1+0x20d126)
> #3 CRYPTO_THREAD_run_once 
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:106
>  (libcrypto.so.1.1+0x20d126)
> #4 kudu::rpc::Messenger::Init() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:444:3 
> (libkrpc.so+0xa9589)
> #5 
> kudu::rpc::MessengerBuilder::Build(std::__1::shared_ptr<kudu::rpc::Messenger>*)
>  /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:205:3 
> (libkrpc.so+0xa903d)
> #6 kudu::rpc::RpcTestBase::CreateMessenger(std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
> std::__1::shared_ptr<kudu::rpc::Messenger>*, int, bool, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> > const&, std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> > const&, std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&) 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/rpc-test-base.h:462:16 
> (reactor-test+0x4d3412)
> #7 kudu::rpc::ReactorTest::SetUp() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor-test.cc:46:5 
> (reactor-test+0x4d02ae)
> #8 void 
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, 
> void>(testing::Test*, void (testing::Test::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
>  (libgmock.so+0x551df)
> #9 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, 
> void>(testing::Test*, void (testing::Test::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
>  (libgmock.so+0x551df)
> #10 testing::Test::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3
>  (libgmock.so+0x342b1)
> #11 testing::TestInfo::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11
>  (libgmock.so+0x3563c)
> #12 testing::TestCase::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28
>  (libgmock.so+0x36116)
> #13 testing::internal::UnitTestImpl::RunAllTests() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43
>  (libgmock.so+0x424ea)
> #14 bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
>  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
>  (libgmock.so+0x5614f)
> #15 bool 
> testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
>  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
>  (libgmock.so+0x5614f)
> #16 testing::UnitTest::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10
>  (libgmock.so+0x41dd2)
> #17 RUN_ALL_TESTS() 
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/gtest/gtest.h:2233:46
>  (libkudu_test_main.so+0x33fb)
> #18 main 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/test_main.cc:106:13 
> (libkudu_test_main.so+0x2bb6)
> Thread T16 'rpc reactor-793' (tid=7934, finished) created by main thread at:
> #0 pthread_create 
> /home/adar/Source/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992
>  (reactor-test+0x45f7d6)
> #1 kudu::Thread::StartThread(std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned 
> long, scoped_refptr<kudu::Thread>*) 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/thread.cc:556:15 
> (libkudu_util.so+0x1c0c8f)
> #2 kudu::Status kudu::Thread::Create<void (kudu::rpc::ReactorThread::*)(), 
> kudu::rpc::ReactorThread*>(std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> > const&, void (kudu::rpc::ReactorThread::* 
> const&)(), kudu::rpc::ReactorThread* const&, scoped_refptr<kudu::Thread>*) 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/thread.h:164:12 
> (libkrpc.so+0xc4095)
> #3 kudu::rpc::ReactorThread::Init() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor.cc:168:10 
> (libkrpc.so+0xbeace)
> #4 kudu::rpc::Reactor::Init() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor.cc:728:18 
> (libkrpc.so+0xc2f81)
> #5 kudu::rpc::Messenger::Init() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:446:5 
> (libkrpc.so+0xa95e2)
> #6 
> kudu::rpc::MessengerBuilder::Build(std::__1::shared_ptr<kudu::rpc::Messenger>*)
>  /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/messenger.cc:205:3 
> (libkrpc.so+0xa903d)
> #7 kudu::rpc::RpcTestBase::CreateMessenger(std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
> std::__1::shared_ptr<kudu::rpc::Messenger>*, int, bool, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> > const&, std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
> std::__1::basic_string<char, std::__1::char_traits<char>, 
> std::__1::allocator<char> > const&, std::__1::basic_string<char, 
> std::__1::char_traits<char>, std::__1::allocator<char> > const&) 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/rpc-test-base.h:462:16 
> (reactor-test+0x4d3412)
> #8 kudu::rpc::ReactorTest::SetUp() 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/rpc/reactor-test.cc:46:5 
> (reactor-test+0x4d02ae)
> #9 void 
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, 
> void>(testing::Test*, void (testing::Test::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
>  (libgmock.so+0x551df)
> #10 void 
> testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, 
> void>(testing::Test*, void (testing::Test::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
>  (libgmock.so+0x551df)
> #11 testing::Test::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3
>  (libgmock.so+0x342b1)
> #12 testing::TestInfo::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11
>  (libgmock.so+0x3563c)
> #13 testing::TestCase::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28
>  (libgmock.so+0x36116)
> #14 testing::internal::UnitTestImpl::RunAllTests() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43
>  (libgmock.so+0x424ea)
> #15 bool 
> testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
>  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10
>  (libgmock.so+0x5614f)
> #16 bool 
> testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
>  bool>(testing::internal::UnitTestImpl*, bool 
> (testing::internal::UnitTestImpl::*)(), char const*) 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438
>  (libgmock.so+0x5614f)
> #17 testing::UnitTest::Run() 
> /home/adar/Source/kudu/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10
>  (libgmock.so+0x41dd2)
> #18 RUN_ALL_TESTS() 
> /home/adar/Source/kudu/build/tsan/../../thirdparty/installed/tsan/include/gtest/gtest.h:2233:46
>  (libkudu_test_main.so+0x33fb)
> #19 main 
> /home/adar/Source/kudu/build/tsan/../../src/kudu/util/test_main.cc:106:13 
> (libkudu_test_main.so+0x2bb6)
> SUMMARY: ThreadSanitizer: data race 
> /home/adar/openssl/openssl-1.1.0g/build_shared/../crypto/threads_pthread.c:95 
> in CRYPTO_THREAD_lock_free
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to