Riza Suminto has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/22634 )
Change subject: IMPALA-13850 (part 1): Wait until CatalogD active before resetting ...................................................................... IMPALA-13850 (part 1): Wait until CatalogD active before resetting In HA mode, CatalogD initialization can fail to complete within reasonable time. Log messages showed that CatalogD is blocked trying to acquire "CatalogServer.catalog_lock_" when calling CatalogServer::UpdateActiveCatalogd() during statestore subscriber registration. catalog_lock_ was held by GatherCatalogUpdatesThread which is calling GetCatalogDelta(), which waits for the java lock versionLock_ which is held by the thread doing CatalogServiceCatalog.reset(). This patch remove catalog reset in JniCatalog constructor. In turn, catalogd-server.cc is now responsible to trigger the metadata reset (Invaidate Metadata) only if: 1. It is the active CatalogD, and 2. Gathering thread has collect the first topic update or CatalogD is set with catalog_topic_mode other than "minimal". The later prerequisite is to ensure that all coordinators are not blocked waiting for full topic update in on-demand metadata mode. This is all managed by a new thread method TriggerResetMetadata that monitor and trigger the initial reset metadata. Note that this is a behavior change in on-demand catalog mode (catalog_topic_mode=minimal). Previously, on-demand catalog mode will send full database list in its first catalog topic update. This behavior change is OK since coordinator can request metadata on-demand. After this patch, catalog-server.active-status and /healthz page can turn into true and OK respectively even if the very first metadata reset is still ongoing. Observer that cares about having fully populated metadata should check other metrics such as catalog.num-db, catalog.num-tables, or /catalog page content. Updated start-impala-cluster.py readiness check to wait for at least 1 table to be seen by coordinators, except during create-load-data.sh execution (there is no table yet) and when use_local_catalog=true (local catalog cache does not start with any table). Modified startup flag checking from reading the actual command line args to reading the '/varz?json' page of the daemon. Cleanup impala_service.py to fix some flake8 issues. Slightly update TestLocalCatalogCompactUpdates::test_restart_catalogd so that unique_database cleanup is successful. Testing: - Refactor test_catalogd_ha.py to reduce repeated code, use unique_database fixture, and additionally validate /healthz page of both active and standby catalogd. Changed it to test using hs2 protocol by default. - Run and pass test_catalogd_ha.py and test_concurrent_ddls.py. - Pass core tests. Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1 Reviewed-on: http://gerrit.cloudera.org:8080/22634 Reviewed-by: Riza Suminto <riza.sumi...@cloudera.com> Tested-by: Riza Suminto <riza.sumi...@cloudera.com> --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M bin/start-impala-cluster.py M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/DebugUtils.java M testdata/bin/create-load-data.sh M tests/common/impala_cluster.py M tests/common/impala_service.py M tests/custom_cluster/test_catalogd_ha.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_process_failures.py 12 files changed, 330 insertions(+), 220 deletions(-) Approvals: Riza Suminto: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/22634 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I58cc66dcccedb306ff11893f2916ee5ee6a3efc1 Gerrit-Change-Number: 22634 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>