[ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3271.
-------------------------------
    Fix Version/s: 1.13.0
       Resolution: Fixed

I checked out the commit before 163cd25 and copied over the test in the patch. 
After running it a couple times, I ran into:
{code:java}
I0408 22:49:44.993857 54213 ts_tablet_manager.cc:1144] T 
ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1: Time spent 
starting tablet: real 0.004s   user 0.002s     sys 0.002s
I0408 22:49:44.993940 54215 raft_consensus.cc:683] T 
ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1 [term 1 
LEADER]: Becoming Leader. State: Replica: dbfd161726d64fa0b01e8a9237fb37d1, 
State: Running, Role: LEADER
W0408 22:49:44.993994 54151 reactor.cc:681] Failed to create an outbound 
connection to 255.255.255.255:1 because connect() failed: Network error: 
connect(2) error: Network is unreachable (error 101)
I0408 22:49:44.994019 54215 consensus_queue.cc:227] T 
ffffffffffffffffffffffffffffffff P dbfd161726d64fa0b01e8a9237fb37d1 [LEADER]: 
Queue going to LEADER mode. State: All replicated index: 0, Majority replicated 
index: 0, Committed index: 0, Last appended: 0.0, Last appended by leader: 0, 
Current term: 1, Majority size: 1, State: 0, Mode: LEADER, active raft config: 
opid_index: -1 peers { permanent_uuid: "dbfd161726d64fa0b01e8a9237fb37d1" 
member_type: VOTER last_known_addr { host: "127.0.0.1" port: 44157 } }
*** Aborted at 1617947385 (unix time) try "date -d @1617947385" if you are 
using GNU date ***
I0408 22:49:45.024998 54168 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025013 54167 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025015 54166 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=101, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025023 54163 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025087 54167 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=101, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025157 54167 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
PC: @          0x229eed3 kudu::UnionIterator::HasNext()
*** SIGSEGV (@0x0) received by PID 54140 (TID 0x7fa30cfde700) from PID 0; stack 
trace: ***
    @     0x7fa31d2b9370 (unknown)
    @          0x229eed3 kudu::UnionIterator::HasNext()
    @           0xb3300c 
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest()
    @           0xb45a09 kudu::tserver::TabletServiceImpl::Scan()
    @          0x2227b79 kudu::rpc::GeneratedServiceIf::Handle()
    @          0x2228839 kudu::rpc::ServicePool::RunThread()
    @          0x23af01f kudu::Thread::SuperviseThread()
    @     0x7fa31d2b1dc5 start_thread
    @     0x7fa31b60976d __clone
Segmentation fault {code}
So I think it's safe to say this was indeed addressed by Todd's locking commit.

[~zhangyifan27] If you're able, feel free to pull 163cd25 into your version of 
Kudu to prevent this in the future, or consider upgrading to 1.13 or higher.

> Tablet server crashed when handle scan request
> ----------------------------------------------
>
>                 Key: KUDU-3271
>                 URL: https://issues.apache.org/jira/browse/KUDU-3271
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.12.0
>            Reporter: YifanZhang
>            Priority: Major
>             Fix For: 1.13.0
>
>         Attachments: tablet-52a743.log
>
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=<optimized out>) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x0000000002185900 in kudu::UnionIterator::HasNext (this=<optimized out>) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x0000000000a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x00000000009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, 
> __in_chrg=<optimized out>) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4  kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880, 
> rpc_context=rpc_context@entry=0x8151d7800,     
> result_collector=result_collector@entry=0x7f2d721679f0, 
> has_more_results=has_more_results@entry=0x7f2d721678f9, 
> error_code=error_code@entry=0x7f2d721678fc)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5  0x00000000009fb009 in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6  0x000000000210f019 in operator() (__args#2=0x8151d7800, 
> __args#1=0xb87b16de0, __args#0=<optimized out>, this=0x4e0c7708) at 
> /usr/include/c++/4.8.2/functional:2471
> #7  kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call=<optimized 
> out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8  0x000000000210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9  0x000000000228ecaf in operator() (this=0xc1a58c28) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 
> 0x00007f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 
> 0x00007f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to