[ https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wong resolved KUDU-3271. ------------------------------- Fix Version/s: 1.13.0 Resolution: Fixed I checked out the commit before 163cd25 and copied over the test in the patch. After running it a couple times, I ran into: {code:java} I0408 22:49:44.993857 54213 ts_tablet_manager.cc:1144] T ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1: Time spent starting tablet: real 0.004s user 0.002s sys 0.002s I0408 22:49:44.993940 54215 raft_consensus.cc:683] T ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1 [term 1 LEADER]: Becoming Leader. State: Replica: dbfd161726d64fa0b01e8a9237fb37d1, State: Running, Role: LEADER W0408 22:49:44.993994 54151 reactor.cc:681] Failed to create an outbound connection to 255.255.255.255:1 because connect() failed: Network error: connect(2) error: Network is unreachable (error 101) I0408 22:49:44.994019 54215 consensus_queue.cc:227] T ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1 [LEADER]: Queue going to LEADER mode. State: All replicated index: 0, Majority replicated index: 0, Committed index: 0, Last appended: 0.0, Last appended by leader: 0, Current term: 1, Majority size: 1, State: 0, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "dbfd161726d64fa0b01e8a9237fb37d1" member_type: VOTER last_known_addr { host: "127.0.0.1" port: 44157 } } *** Aborted at 1617947385 (unix time) try "date -d @1617947385" if you are using GNU date *** I0408 22:49:45.024998 54168 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025013 54167 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025015 54166 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=101, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025023 54163 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025087 54167 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=101, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025157 54167 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 PC: @ 0x229eed3 kudu::UnionIterator::HasNext() *** SIGSEGV (@0x0) received by PID 54140 (TID 0x7fa30cfde700) from PID 0; stack trace: *** @ 0x7fa31d2b9370 (unknown) @ 0x229eed3 kudu::UnionIterator::HasNext() @ 0xb3300c kudu::tserver::TabletServiceImpl::HandleContinueScanRequest() @ 0xb45a09 kudu::tserver::TabletServiceImpl::Scan() @ 0x2227b79 kudu::rpc::GeneratedServiceIf::Handle() @ 0x2228839 kudu::rpc::ServicePool::RunThread() @ 0x23af01f kudu::Thread::SuperviseThread() @ 0x7fa31d2b1dc5 start_thread @ 0x7fa31b60976d __clone Segmentation fault {code} So I think it's safe to say this was indeed addressed by Todd's locking commit. [~zhangyifan27] If you're able, feel free to pull 163cd25 into your version of Kudu to prevent this in the future, or consider upgrading to 1.13 or higher. > Tablet server crashed when handle scan request > ---------------------------------------------- > > Key: KUDU-3271 > URL: https://issues.apache.org/jira/browse/KUDU-3271 > Project: Kudu > Issue Type: Bug > Affects Versions: 1.12.0 > Reporter: YifanZhang > Priority: Major > Fix For: 1.13.0 > > Attachments: tablet-52a743.log > > > We found that one of kudu tablet server crashed when handle scan request. The > scanned table didn't have any row operations at that time. This issue only > came up once so far. > Coredump stack is: > {code:java} > Program terminated with signal 11, Segmentation fault. > (gdb) bt > #0 kudu::tablet::DeltaApplier::HasNext (this=<optimized out>) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84 > #1 0x0000000002185900 in kudu::UnionIterator::HasNext (this=<optimized out>) > at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051 > #2 0x0000000000a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner > (this=0x4fea140, scanner_id=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195 > #3 0x00000000009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, > __in_chrg=<optimized out>) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179 > #4 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x60edef0, req=req@entry=0x9582e880, > rpc_context=rpc_context@entry=0x8151d7800, > result_collector=result_collector@entry=0x7f2d721679f0, > has_more_results=has_more_results@entry=0x7f2d721678f9, > error_code=error_code@entry=0x7f2d721678fc) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737 > #5 0x00000000009fb009 in kudu::tserver::TabletServiceImpl::Scan > (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907 > #6 0x000000000210f019 in operator() (__args#2=0x8151d7800, > __args#1=0xb87b16de0, __args#0=<optimized out>, this=0x4e0c7708) at > /usr/include/c++/4.8.2/functional:2471 > #7 kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call=<optimized > out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139 > #8 0x000000000210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) > at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225 > #9 0x000000000228ecaf in operator() (this=0xc1a58c28) at > /usr/include/c++/4.8.2/functional:2471 > #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at > /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 > 0x00007f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 > 0x00007f2de4e6873d in clone () from /lib64/libc.so.6 > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)