[ 
https://issues.apache.org/jira/browse/IMPALA-14220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005286#comment-18005286
 ] 

ASF subversion and git services commented on IMPALA-14220:
----------------------------------------------------------

Commit d41d325b4154f9526991b6fb568b59fa1ffe5501 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d41d325b4 ]

IMPALA-14220: CatalogServer::IsActive must not hold catalog_lock_

When catalogd HA is enabled, catalogd will check whether it is the
active one, via CatalogServer::IsActive, before serving each request.
Calling CatalogServer::IsActive require obtaining catalog_lock_, which
can contend with long catalog operation such as
GatherCatalogUpdatesThread.

Checking current catalog active status does not need to obtain
catalog_lock_. Instead, it is sufficient to change is_active_ field from
boolean to AtomicBoolean. This patch applies that change.

With this change, CatalogServiceThriftIf::AcceptRequest can return
faster. Other CatalogServiceThriftIf methods that previously blocked on
AcceptRequest method can proceeed faster, but might still contend over
Catalog's versionLock_ in JVM later.

Testing:
Run and pass test_catalogd_ha.py.

Change-Id: I15fb925f1eb4ea5d213075b66a676d2bc9b9e9f1
Reviewed-on: http://gerrit.cloudera.org:8080/23168
Reviewed-by: Abhishek Rawat <[email protected]>
Reviewed-by: Wenzhe Zhou <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> IsActive checks blocked by the getCatalogDelta operation when there are slow 
> DDLs
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-14220
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14220
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Catalog
>            Reporter: Quanlong Huang
>            Assignee: Riza Suminto
>            Priority: Blocker
>
> When catalogd HA is enabled, catalogd will check whther it's the active one 
> before serving each request, i.e. in 
> [AcceptRequest()|https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L593]:
> {code:cpp}
>   Status AcceptRequest(CatalogServiceVersion::type client_version) {
>     ...
>     } else if (FLAGS_enable_catalogd_ha && !catalog_server_->IsActive()) {
>       status = Status(Substitute("Request for Catalog service is rejected 
> since "
>           "catalogd $0 is in standby mode", server_address_));
>     }
> {code}
> This check requires holding the catalog_lock_:
> {code:cpp}
> bool CatalogServer::IsActive() {
>   lock_guard<mutex> l(catalog_lock_);
>   return is_active_;
> }{code}
> [https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L896]
> This lock is also held by 
> [GatherCatalogUpdatesThread|https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L905]
>  (a.k.a. topic update thread) which invokes JNI method GetCatalogDelta to 
> collect catalog updates.
> It's known that collecting catalog updates could be blocked by slow DDLs that 
> holding the table lock for a long time (IMPALA-6671). The topic update thread 
> usually waits for 1 minute (configured by topic_update_tbl_max_wait_time_ms / 
> 2) on the table lock and then skips it with a warning like this:
> {noformat}
> Table tpch.lineitem (version=2373, lastSeen=2373) is skipping topic update 
> (2387, 2388] due to lock contention{noformat}
> If the table hasn't been collected 3 consecutive times (configured by 
> catalog_max_lock_skipped_topic_updates), topic update thread will wait 
> infinitely on it in the next time.
> So when the topic update thread is slow in collecting one round of updates, 
> it holds the catalog_lock_ for a long time and blocks all new requests on 
> this catalogd. This impacts performance on all queries that requires loading 
> metadata from catalogd.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to