[
https://issues.apache.org/jira/browse/IMPALA-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004965#comment-18004965
]
Quanlong Huang commented on IMPALA-14220:
-----------------------------------------
Here is an example stack of a GetPartialCatalogObject request that is blocked
in this lock:
{noformat}
Thread 77 (Thread 0x7ff22e7f3640 (LWP 11612) "catalogd"):
#0 0x00007ff287c873f0 in __lll_lock_wait () from /lib64/libc.so.6
#1 0x00007ff287c8d582 in pthread_mutex_lock@@GLIBC_2.2.5 () from
/lib64/libc.so.6
#2 0x0000000000f2d589 in __gthread_mutex_lock (__mutex=0x7ffc2797d738) at
/grid/0/jenkins/workspace/workspace/CDH-parallel-redhat9/SOURCES/impala/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/x86_64-pc-linux-gnu/bits/gthr-default.h:749
#3 std::mutex::lock (this=0x7ffc2797d738) at
/grid/0/jenkins/workspace/workspace/CDH-parallel-redhat9/SOURCES/impala/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/bits/std_mutex.h:100
#4 std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>)
at
/grid/0/jenkins/workspace/workspace/CDH-parallel-redhat9/SOURCES/impala/toolchain/toolchain-packages-gcc10.4.0/gcc-10.4.0/include/c++/10.4.0/bits/std_mutex.h:159
#5 impala::CatalogServer::IsActive (this=0x7ffc2797d660) at
catalog-server.cc:685
#6 0x0000000000f53288 in impala::CatalogServiceThriftIf::AcceptRequest
(client_version=impala::CatalogServiceVersion::V2, this=0x8a1e090) at
catalog-server.cc:438
#7 impala::CatalogServiceThriftIf::GetPartialCatalogObject (this=0x8a1e090,
resp=..., req=...) at catalog-server.cc:371
#8 0x0000000000f1a3df in
impala::CatalogServiceProcessorT<apache::thrift::protocol::TDummyProtocol>::process_GetPartialCatalogObject
(this=0x7d7fce0, seqid=0, iprot=<optimized out>, oprot=<optimized out>,
callContext=<optimized out>) at
../../generated-sources/gen-cpp/CatalogService.tcc:2990
#9 0x0000000000efa063 in
impala::CatalogServiceProcessorT<apache::thrift::protocol::TDummyProtocol>::dispatchCall
(this=<optimized out>, iprot=0x9cc2dc0, oprot=0x9cc2f40, fname=..., seqid=0,
callContext=0x9258780) at
../../generated-sources/gen-cpp/CatalogService.tcc:2172
#10 0x0000000000f01abb in apache::thrift::TDispatchProcessor::process
(this=0x7d7fce0, in=..., out=..., connectionContext=0x9258780) at
../../../toolchain/toolchain-packages-gcc10.4.0/thrift-0.16.0-p6/include/thrift/TDispatchProcessor.h:121
#11 0x0000000001405dea in apache::thrift::server::TAcceptQueueServer::Task::run
(this=0x9966120) at TAcceptQueueServer.cpp:86
...{noformat}
> IsActive checks blocked by the getCatalogDelta operation when there are slow
> DDLs
> ---------------------------------------------------------------------------------
>
> Key: IMPALA-14220
> URL: https://issues.apache.org/jira/browse/IMPALA-14220
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Catalog
> Reporter: Quanlong Huang
> Priority: Blocker
>
> When catalogd HA is enabled, catalogd will check whther it's the active one
> before serving each request, i.e. in
> [AcceptRequest()|https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L593]:
> {code:cpp}
> Status AcceptRequest(CatalogServiceVersion::type client_version) {
> ...
> } else if (FLAGS_enable_catalogd_ha && !catalog_server_->IsActive()) {
> status = Status(Substitute("Request for Catalog service is rejected
> since "
> "catalogd $0 is in standby mode", server_address_));
> }
> {code}
> This check requires holding the catalog_lock_:
> {code:cpp}
> bool CatalogServer::IsActive() {
> lock_guard<mutex> l(catalog_lock_);
> return is_active_;
> }{code}
> [https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L896]
> This lock is also held by
> [GatherCatalogUpdatesThread|https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L905]
> (a.k.a. topic update thread) which invokes JNI method GetCatalogDelta to
> collect catalog updates.
> It's known that collecting catalog updates could be blocked by slow DDLs that
> holding the table lock for a long time (IMPALA-6671). The topic update thread
> usually waits for 1 minute (configured by topic_update_tbl_max_wait_time_ms /
> 2) on the table lock and then skips it with a warning like this:
> {noformat}
> Table tpch.lineitem (version=2373, lastSeen=2373) is skipping topic update
> (2387, 2388] due to lock contention{noformat}
> If the table hasn't been collected 3 consecutive times (configured by
> catalog_max_lock_skipped_topic_updates), topic update thread will wait
> infinitely on it in the next time.
> So when the topic update thread is slow in collecting one round of updates,
> it holds the catalog_lock_ for a long time and blocks all new requests on
> this catalogd. This impacts performance on all queries that requires loading
> metadata from catalogd.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]