[ https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223996#comment-17223996 ]
Andrew Wong edited comment on KUDU-3108 at 10/31/20, 4:30 AM: -------------------------------------------------------------- I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this crash with the following sequence (ignore the \{{-1}}s – their functionality is not committed yet). {code:java} TEST_F(FuzzTest, Kudu3108) { CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema()); RunFuzzCase({ {TEST_INSERT, 1, -1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_INSERT_IGNORE, 3, -1}, {TEST_DELETE, 1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_UPSERT, 3}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_INSERT, 0, -1}, {TEST_FLUSH_OPS, -1}, {TEST_UPDATE_IGNORE, 0}, {TEST_UPDATE, 3}, {TEST_FLUSH_OPS, -1}, {TEST_DIFF_SCAN, 5, 15}, }); } {code} This results in the following crash: {code:java} F1030 21:16:58.411253 40800 schema.h:706] Check failed: KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema()) *** Check failure stack trace: *** *** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are using GNU date *** PC: @ 0x7f701fcf11d7 __GI_raise *** SIGABRT (@0x111700009efd) received by PID 40701 (TID 0x7f6ff0f47700) from PID 40701; stack trace: *** @ 0x7f7026a70370 (unknown) @ 0x7f701fcf11d7 __GI_raise @ 0x7f701fcf28c8 __GI_abort @ 0x7f70224377b9 google::logging_fail() @ 0x7f7022438f8d google::LogMessage::Fail() @ 0x7f702243aee3 google::LogMessage::SendToLog() @ 0x7f7022438ae9 google::LogMessage::Flush() @ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal() @ 0x7f702cc99fbc kudu::Schema::Compare<>() @ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap() @ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap() @ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow() @ 0x7f70261688e9 kudu::MergeIterator::NextBlock() @ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock() @ 0x7f70317bcab3 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest() @ 0x7f70317bb857 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() @ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan() @ 0x7f702ddfd762 _ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ @ 0x7f702de0064d _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ @ 0x7f702b4ddcc2 std::function<>::operator()() @ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle() @ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread() @ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv @ 0x7f702b4e0337 _ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data @ 0x7f7033524b9c std::function<>::operator()() @ 0x7f70248227e0 kudu::Thread::SuperviseThread() @ 0x7f7026a68dc5 start_thread @ 0x7f701fdb376d __clone Aborted {code} I haven't fully grokked this sequence, but I will look into this in the coming days. was (Author: andrew.wong): I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this crash with the following sequence (ignore the \{{-1}}s – their functionality is not committed yet). {code:java} TEST_F(FuzzTest, Kudu3108) { CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema()); RunFuzzCase({ {TEST_INSERT, 1, -1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_INSERT_IGNORE, 3, -1}, {TEST_DELETE, 1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_UPSERT, 3}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_INSERT, 0, -1}, {TEST_FLUSH_OPS, -1}, {TEST_UPDATE_IGNORE, 0}, {TEST_UPDATE, 3}, {TEST_FLUSH_OPS, -1}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_UPSERT, 3}, {TEST_INSERT, 2, -1}, {TEST_DIFF_SCAN, 5, 15}, }); } {code} This results in the following crash: {code:java} F1030 21:16:58.411253 40800 schema.h:706] Check failed: KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema()) *** Check failure stack trace: *** *** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are using GNU date *** PC: @ 0x7f701fcf11d7 __GI_raise *** SIGABRT (@0x111700009efd) received by PID 40701 (TID 0x7f6ff0f47700) from PID 40701; stack trace: *** @ 0x7f7026a70370 (unknown) @ 0x7f701fcf11d7 __GI_raise @ 0x7f701fcf28c8 __GI_abort @ 0x7f70224377b9 google::logging_fail() @ 0x7f7022438f8d google::LogMessage::Fail() @ 0x7f702243aee3 google::LogMessage::SendToLog() @ 0x7f7022438ae9 google::LogMessage::Flush() @ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal() @ 0x7f702cc99fbc kudu::Schema::Compare<>() @ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap() @ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap() @ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow() @ 0x7f70261688e9 kudu::MergeIterator::NextBlock() @ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock() @ 0x7f70317bcab3 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest() @ 0x7f70317bb857 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() @ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan() @ 0x7f702ddfd762 _ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ @ 0x7f702de0064d _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ @ 0x7f702b4ddcc2 std::function<>::operator()() @ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle() @ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread() @ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv @ 0x7f702b4e0337 _ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data @ 0x7f7033524b9c std::function<>::operator()() @ 0x7f70248227e0 kudu::Thread::SuperviseThread() @ 0x7f7026a68dc5 start_thread @ 0x7f701fdb376d __clone Aborted {code} I haven't fully grokked this sequence, but I will look into this in the coming days. > Tablet server crashes when handle diffscan request > --------------------------------------------------- > > Key: KUDU-3108 > URL: https://issues.apache.org/jira/browse/KUDU-3108 > Project: Kudu > Issue Type: Bug > Affects Versions: 1.10.1 > Reporter: YifanZhang > Priority: Major > > When we did an incremental backup for tables in a cluster with 20 tservers, > 3 tservers crashed, coredump stacks are the same: > {code:java} > Unable to find source-code formatter for language: shell. Available languages > are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, > groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, > perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, > yamlProgram terminated with signal 11, Segmentation fault.Program terminated > with signal 11, Segmentation fault. > #0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow> > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file > or directory. > Missing separate debuginfos, use: debuginfo-install > bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 > cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 > cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64 > elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 > keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 > libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 > libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 > libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 > ncurses-libs-5.9-13.20130511.el7.x86_64 > nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 > openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 > systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 > zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 kudu::Schema::Compare<kudu::RowBlockRow, kudu::RowBlockRow> > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > #1 0x0000000001da51fb in kudu::MergeIterator::RefillHotHeap > (this=this@entry=0x78f6ec500) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720 > #2 0x0000000001da622b in kudu::MergeIterator::AdvanceAndReheap > (this=this@entry=0x78f6ec500, state=0xd1661a000, > num_rows_to_advance=num_rows_to_advance@entry=1) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690 > #3 0x0000000001da7927 in kudu::MergeIterator::MaterializeOneRow > (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0, > dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894 > #4 0x0000000001da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500, > dst=0x7f0d5cc9ffc0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796 > #5 0x0000000000a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock > (this=<optimized out>, dst=<optimized out>) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499 > #6 0x000000000095475c in > kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565 > #7 0x0000000000966564 in > kudu::tserver::TabletServiceImpl::HandleNewScanRequest > (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > scanner_id=scanner_id@entry=0x7f0d5cca0940, > snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476 > #8 0x0000000000967f4b in kudu::tserver::TabletServiceImpl::Scan > (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674 > #9 0x0000000001d2e449 in operator() (__args#2=0x5e512a460, > __args#1=0x56f9be6c0, __args#0=<optimized out>, this=0x497ecdd8) at > /usr/include/c++/4.8.2/functional:2471 > #10 kudu::rpc::GeneratedServiceIf::Handle (this=0x53b5a90, call=<optimized > out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139 > #11 0x0000000001d2eb49 in kudu::rpc::ServicePool::RunThread (this=0x2ab69560) > at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225 > #12 0x0000000001e9e924 in operator() (this=0x90fb52e8) at > /home/zhangyifan8/work/kudu-xm/thirdparty/installed/uninstrumented/include/boost/function/function_template.hpp:771 > #13 kudu::Thread::SuperviseThread (arg=0x90fb52c0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:657 > #14 0x00007f103b20cdc5 in start_thread () from /lib64/libpthread.so.0 > #15 0x00007f103956673d in clone () from /lib64/libc.so.6 > {code} > Before we did first time full backup, we set extra config for these tables > via CLI tool: > {code:java} > kudu table set_extra_config @c3tst-master xxx kudu.table.history_max_age_sec > 604800 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)