[ 
https://issues.apache.org/jira/browse/KUDU-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765274#comment-17765274
 ] 

ASF subversion and git services commented on KUDU-3461:
-------------------------------------------------------

Commit 438216ad33a7d096f276eef6cc52981c3b2fc2c9 in kudu's branch 
refs/heads/master from Ashwani Raina
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=438216ad3 ]

KUDU-3461 [client] Avoid impala crash by returning error if invalid tablet id 
found

Kudu C++ clients maintain per client level metacache. So, if one client is 
issuing
insert ops on a partition and another client issues a 'cache invalidating 
worth' DDL
op on the same partition, first client's cache won't get invalidated. In some
workflows, this could potentially lead to an infinite recursion situation in
C++ client code that can eventually end up crashing impala daemon, due to stack 
overflow.
The same situation can happen if it is a mix of C++ and Java clients as long as
there is atleast one C++ client involved in the workflow.

The short-term fix is to avoid crash by detecting the invalid tablet id
condition and return error from kudu c++ client to impala daemon.
Following are the steps to reproduce the issue from impala-shell:
+++
1. drop table if exists impala_crash;
2. create table if not exists impala_crash \
        ( dt string, col string, primary key(dt) ) \
        partition by range(dt) ( partition values <= '00000000' ) \
        stored as kudu;
3. alter table impala_crash drop if exists range partition value='20230301';
4. alter table impala_crash add if not exists range partition value='20230301';
5. insert into impala_crash values ('20230301','abc');
6. alter table impala_crash drop if exists range partition value='20230301';
7. alter table impala_crash add if not exists range partition value='20230301';
8. insert into impala_crash values ('20230301','abc');
+++
The last statement i.e. #8 causes impalad (connected to impala-shell) to crash
With this change, last statement query fails and throws 
"Status::InvalidArgument()" error.

This change also includes unit test to test both scenarios:
1. Reproduce the infinite recursion case without a fix, expect it to crash
2. Reproduce the infinite recursion case with fix, expect it to return
   "Status::InvalidArgument()" error instead of crashing due to stack overflow.
   Inserting the row again after last step should succeed as expected
   as the stale cache entry for the tablet is cleared by now.

Change-Id: Ia09cf6fb1b1d10f1ad13a62b5c863bcd1e3ab26a
Reviewed-on: http://gerrit.cloudera.org:8080/20270
Reviewed-by: Alexey Serbin <ale...@apache.org>
Tested-by: Kudu Jenkins


> Kudu client can blow the stack with infinite recursions between PickLeader() 
> and LookupTabletByKey()
> ----------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3461
>                 URL: https://issues.apache.org/jira/browse/KUDU-3461
>             Project: Kudu
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.17.0
>            Reporter: Joe McDonnell
>            Assignee: Ashwani Raina
>            Priority: Blocker
>
> In an Impala cluster, we ran into a scenario that causes Impala to crash with 
> a SIGSEGV. When reproducing while running in gdb, we see the stack get blown 
> out with this recursion:
> {noformat}
> #0  0x00007f983e031a1c in clock_gettime ()
> #1  0x00007f983bfda0b5 in __GI___clock_gettime (clock_id=clock_id@entry=1, 
> tp=0x7f967bd8b070) at ../sysdeps/unix/sysv/linux/clock_gettime.c:38
> #2  0x00007f983c9f8e48 in kudu::Stopwatch::GetTimes (times=0x7f967bd8b1b0, 
> this=<optimized out>, this=<optimized out>) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:294
> #3  0x00007f983ca09829 in kudu::Stopwatch::stop (this=0x7f967bd8b320) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:218
> #4  kudu::Stopwatch::stop (this=0x7f967bd8b320) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:213
> #5  kudu::sw_internal::LogTiming::Print (max_expected_millis=50, 
> this=0x7f967bd8b320) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:359
> #6  kudu::sw_internal::LogTiming::~LogTiming (this=0x7f967bd8b320, 
> __in_chrg=<optimized out>) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/stopwatch.h:329
> #7  0x00007f983c9fe32c in 
> kudu::client::internal::MetaCache::LookupEntryByKeyFastPath (this=<optimized 
> out>, table=<optimized out>, partition_key=..., entry=0x7f967bd8b4c0) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/locks.h:99
> #8  0x00007f983c9fe656 in kudu::client::internal::MetaCache::DoFastPathLookup 
> (this=0xde431e0, table=0xf899300, partition_key=0x7f967bd8b700, 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1243
> #9  0x00007f983ca05731 in 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1405
> #10 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #11 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967bd8b8c0) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #12 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #13 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #14 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967bd8bad0) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #15 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #16 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #17 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967bd8bce0) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #18 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #19 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #20 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967bd8bef0) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #21 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #22 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #23 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967bd8c100) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #24 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #25 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #26 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967bd8c310) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #27 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> ... continues ...
> #47617 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #47618 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967c589290) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47619 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #47620 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> --Type <RET> for more, q to quit, c to continue without paging--
> #47621 0x00007f983ca0575f in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0x7f967c5894a0) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47622 
> kudu::client::internal::MetaCache::LookupTabletByKey(kudu::client::KuduTable 
> const*, kudu::PartitionKey, kudu::MonoTime const&, 
> kudu::client::internal::MetaCache::LookupType, 
> scoped_refptr<kudu::client::internal::RemoteTablet>*, std::function<void 
> (kudu::Status const&)> const&) (this=0xde431e0, table=0xf899300, 
> partition_key=..., deadline=..., 
> lookup_type=kudu::client::internal::MetaCache::LookupType::kPoint, 
> remote_tablet=0x0, callback=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:1408
> #47623 0x00007f983ca0598c in 
> kudu::client::internal::MetaCacheServerPicker::PickLeader(std::function<void 
> (kudu::Status const&, kudu::client::internal::RemoteTabletServer*)> const&, 
> kudu::MonoTime const&) (this=0xdec0000, callback=..., deadline=...)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/common/partition.h:153
> #47624 0x00007f983ca066a7 in std::function<void (kudu::Status 
> const&)>::operator()(kudu::Status const&) const (__args#0=..., 
> this=0xca50918) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617#47625 
> kudu::client::internal::LookupRpc::SendRpcCb (this=0xca50800, status=...) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/client/meta_cache.cc:966
> #47626 0x00007f983c9db65c in 
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>  
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}::operator()()
>  const (this=<optimized out>, this=<optimized out>)
>     at /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/status.h:230#47627 
> std::__invoke_impl<void, 
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>  
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&>(std::__invoke_other,
>  
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>  kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&) 
> (__f=...) at /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/invoke.h:60
> #47628 std::__invoke_r<void, 
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>  
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&>(void&&, 
> (kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>  kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}&)...) 
> (__fn=...) at /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/invoke.h:110
> #47629 std::_Function_handler<void (), 
> kudu::client::internal::AsyncLeaderMasterRpc<kudu::master::GetTableLocationsRequestPB,
>  
> kudu::master::GetTableLocationsResponsePB>::SendRpc()::{lambda()#1}>::_M_invoke(std::_Any_data
>  const&) (__functor=...)
>     at /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:291
> #47630 0x00007f983cac860b in std::function<void ()>::operator()() const 
> (this=0xee3f9c0) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47631 kudu::rpc::OutboundCall::CallCallback (this=0xee3f840) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/rpc/outbound_call.cc:309
> #47632 0x00007f983cabb763 in kudu::rpc::Connection::HandleCallResponse 
> (this=0xcd00700, transfer=...) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:172
> #47633 0x00007f983cabc215 in kudu::rpc::Connection::ReadHandler 
> (this=0xcd00700, watcher=..., revents=<optimized out>) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/unique_ptr.h:172#47634 
> 0x00007f983cdb3ffb in ev_invoke_pending (loop=0xcc99b00) at 
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/src/libev-4.20/ev.c:3155
> #47635 0x00007f983ca97cc8 in kudu::rpc::ReactorThread::InvokePendingCb 
> (loop=0xcc99b00) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/rpc/reactor.cc:202
> #47636 0x00007f983cdb73f7 in ev_run (flags=0, loop=0xcc99b00) at 
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/src/libev-4.20/ev.c:3555
> #47637 ev_run (loop=0xcc99b00, flags=0) at 
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/src/libev-4.20/ev.c:3402
> #47638 0x00007f983ca98bd9 in ev::loop_ref::run (flags=0, this=0xef75be0) at 
> /mnt/source/kudu/kudu-345fd44ca3/thirdparty/installed/uninstrumented/include/ev++.h:211#47639
>  kudu::rpc::ReactorThread::RunThread (this=0xef75bd8) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/rpc/reactor.cc:503
> #47640 0x00007f983cc2d36c in std::function<void ()>::operator()() const 
> (this=0xec68358) at 
> /mnt/build/gcc-10.4.0/include/c++/10.4.0/bits/std_function.h:617
> #47641 kudu::Thread::SuperviseThread (arg=0xec68300) at 
> /mnt/source/kudu/kudu-345fd44ca3/src/kudu/util/thread.cc:691
> #47642 0x00007f983dfec609 in start_thread (arg=<optimized out>) at 
> pthread_create.c:477
> #47643 0x00007f983c01c133 in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95{noformat}
> It hits a SIGSEGV because the stack gets blown out.
> Here are the steps to reproduce it from Impala:
> {noformat}
> /** 1. Create table **/
> drop table if exists impala_crash;
> create table if not exists impala_crash (
> dt string,
> col string,
> primary key(dt)
> )
> partition by range(dt) (
> partition values <= '00000000'
> )
> stored as kudu;/** 2. alter and insert **/
> alter table impala_crash drop if exists range partition value='20230301';
> alter table impala_crash add if not exists range partition value='20230301';
> insert into impala_crash values ('20230301','abc');
> /* normal *//** 3. Run the same queries again and impala daemon crashes **/
> alter table impala_crash drop if exists range partition value='20230301';
> alter table impala_crash add if not exists range partition value='20230301';
> insert into impala_crash values ('20230301','abc');{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to