Hello Quanlong Huang, Abhishek Rawat, Wenzhe Zhou, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/22634

to look at the new patch set (#7).

Change subject: IMPALA-13850: Wait until CatalogD active before resetting 
metadata
......................................................................

IMPALA-13850: Wait until CatalogD active before resetting metadata

In HA mode, CatalogD initialization can fail to complete within
reasonable time. Log messages showed that CatalogD is blocked trying to
acquire "CatalogServer.catalog_lock_" when calling
CatalogServer::UpdateActiveCatalogd() during statestore subscriber
registration. catalog_lock_ was held by GatherCatalogUpdatesThread which
is calling GetCatalogDelta(), which waits for the java lock versionLock_
which is held by the thread doing CatalogServiceCatalog.reset().

This patch remove catalog reset in JniCatalog constructor. In turn,
catalogd-server.cc is now responsible to trigger the metadata
reset (Invaidate Metadata) only if it is the active CatalogD and
gathering thread has collect the first topic update. The later
prerequisite is to ensure that all coordinators are not blocked waiting
for full topic update. This is all managed by a new thread method
TriggerResetMetadata that monitor and trigger this initial reset
metadata.

After this patch, catalog-server.active-status and /healthz page can
turn into true and OK respectively even if the very first metadata reset
is still ongoing. Observer that cares about having fully populated
metadata should check other metrics such as catalog.num-db,
catalog.num-tables, or /catalog page content.

Updated start-impala-cluster.py readiness check to wait for at least 1
table to be seen by coordinators, except during create-load-data.sh
execution.

Testing:
- Refactor test_catalogd_ha.py to reduce repeated code, use
  unique_database fixture, and additionally validate /healthz page of
  both active and standby catalogd. Changed it to test using hs2
  protocol by default.
- Run and pass test_catalogd_ha.py and test_concurrent_ddls.py.

Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M bin/start-impala-cluster.py
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M testdata/bin/create-load-data.sh
M tests/common/impala_cluster.py
M tests/custom_cluster/test_catalogd_ha.py
9 files changed, 275 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/22634/7
--
To view, visit http://gerrit.cloudera.org:8080/22634
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1
Gerrit-Change-Number: 22634
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

Reply via email to