Hello Quanlong Huang, k.venureddy2...@gmail.com, Sai Hemanth Gantasala, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/22640 to look at the new patch set (#19). Change subject: IMPALA-13850 (part 2): Implement in-place reset for CatalogD ...................................................................... IMPALA-13850 (part 2): Implement in-place reset for CatalogD This patch improve the availability of CatalogD under huge INVALIDATE METADATA operation. Previously, CatalogServiceCatalog.reset() hold versionLock_.writeLock() for the whole reset duration. When the number of database, tables, or functions are big, this write lock can be held for a long time, preventing any other catalog operation from proceeding. This patch improve the situation by making CatalogServiceCatalog.reset() do invalidations in stages and release the write lock temporarily in between stages. In order to do so, Db adds, updates, and removal should happen directly into dbCache_. Enforce lexicographic order during reset() and ensure all Db invalidation within a single stage is complete before releasing the write lock. Stages should run in approximately the same amount of time. A catalog operation over a database must ensure that no reset operation is currently running, or the database name is lexicographically less than the current database-under-invalidation. This patch adds waitOngoingResetMetadata() method to help facilitate that waiting. Caller must hold the versionLock_.writeLock() before calling waitOngoingResetMetadata(). CatalogServiceCatalog.getAllDbs() get a snapshot of dbCache_ values at a time. With this patch, it is now possible that some Db in this snapshot maybe removed from dbCache() by concurrent reset(). Caller that cares about snapshot integrity like CatalogServiceCatalog.getCatalogDelta() should be careful when iterating the snapshot. It must iterate in lexicographic order, similar like reset(), and make sure that it does not go beyond the current database-under-invalidation. It also must skip the Db that it is currently being inspected if Db.isRemoved() is True. Added helper class InvalidateAwareDbSnapshot for this kind of iteration Expand test_restart_catalogd_twice to test_restart_legacy_catalogd_twice and test_restart_local_catalogd_twice and adjust the test to wait for full IMPALA_CATALOG_TOPIC update if necessary using CatalogChangeMonitor class. Update CustomClusterTestSuite.wait_for_wm_init_complete() to correctly pass timeout values to helper methods that it calls. Reduce cluster_size from 10 to 3 in few tests of test_workload_mgmt_init.py to avoid flakiness. Testing: - Pass exhaustive tests. Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/DebugUtils.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_catalogd_ha.py M tests/custom_cluster/test_concurrent_ddls.py M tests/custom_cluster/test_ext_data_sources.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_metadata_replicas.py M tests/custom_cluster/test_restart_services.py M tests/custom_cluster/test_workload_mgmt_init.py A tests/util/catalog_monitor.py 19 files changed, 534 insertions(+), 159 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/22640/19 -- To view, visit http://gerrit.cloudera.org:8080/22640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d Gerrit-Change-Number: 22640 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <k.venureddy2...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Sai Hemanth Gantasala <saihema...@cloudera.com>