[jira] [Commented] (IMPALA-13850) Catalogd should not start metadata operation until initialization is done if HA is enabled

ASF subversion and git services (Jira) Wed, 09 Jul 2025 22:24:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004340#comment-18004340
 ]


ASF subversion and git services commented on IMPALA-13850:
----------------------------------------------------------

Commit 0b1a32fad8a6cc5173b0ac1585af69f08d583ed9 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0b1a32fad ]

IMPALA-13850 (part 4): Implement in-place reset for CatalogD

This patch improve the availability of CatalogD under huge INVALIDATE
METADATA operation. Previously, CatalogServiceCatalog.reset() hold
versionLock_.writeLock() for the whole reset duration. When the number
of database, tables, or functions are big, this write lock can be held
for a long time, preventing any other catalog operation from proceeding.

This patch improve the situation by:
1. Making CatalogServiceCatalog.reset() rebuild dbCache_ in place and
   occasionally release the write lock between rebuild stages.
2. Fetch databases, tables, and functions metadata from MetaStore in
   background using ExecutorService. Added catalog_reset_max_threads
   flag to control number of threads to do parallel fetch.

In order to do so, lexicographic order must be enforced during reset()
and ensure all Db invalidation within a single stage is complete before
releasing the write lock. Stages should run in approximately the same
amount of time. A catalog operation over a database must ensure that no
reset operation is currently running, or the database name is
lexicographically less than the current database-under-invalidation.

This patch adds CatalogResetManager to do background metadata fetching
and provide helper methods to help facilitate waiting for reset
progress. CatalogServiceCatalog must hold the versionLock_.writeLock()
before calling most of CatalogResetManager methods.

These are methods in CatalogServiceCatalog class that must wait for
CatalogResetManager.waitOngoingMetadataFetch():

addDb()
addFunction()
addIncompleteTable()
addTable()
invalidateTableIfExists()
removeDb()
removeFunction()
removeTable()
renameTable()
replaceTableIfUnchanged()
tryLock()
updateDb()
InvalidateAwareDbSnapshotIterator.hasNext()

Concurrent global IM must wait until currently running global IM
complete. The waiting happens by calling waitFullMetadataFetch().

CatalogServiceCatalog.getAllDbs() get a snapshot of dbCache_ values at a
time. With this patch, it is now possible that some Db in this snapshot
maybe removed from dbCache() by concurrent reset(). Caller that cares
about snapshot integrity like CatalogServiceCatalog.getCatalogDelta()
should be careful when iterating the snapshot. It must iterate in
lexicographic order, similar like reset(), and make sure that it does
not go beyond the current database-under-invalidation. It also must skip
the Db that it is currently being inspected if Db.isRemoved() is True.
Added helper class InvalidateAwareDbSnapshot for this kind of iteration

Override CatalogServiceCatalog.getDb() and
CatalogServiceCatalog.getDbs() to wait until first reset metadata
complete or looked up Db found in cache.

Expand test_restart_catalogd_twice to test_restart_legacy_catalogd_twice
and test_restart_local_catalogd_twice. Update
CustomClusterTestSuite.wait_for_wm_init_complete() to correctly pass
timeout values to helper methods that it calls. Reduce cluster_size from
10 to 3 in few tests of test_workload_mgmt_init.py to avoid flakiness.

Fixed HMS connection leak between tests in AuthorizationStmtTest (see
IMPALA-8073).

Testing:
- Pass exhaustive tests.

Change-Id: Ib4ae2154612746b34484391c5950e74b61f85c9d
Reviewed-on: http://gerrit.cloudera.org:8080/22640
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Quanlong Huang <[email protected]>


> Catalogd should not start metadata operation until initialization is done if 
> HA is enabled
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13850
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13850
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Wenzhe Zhou
>            Assignee: Riza Suminto
>            Priority: Critical
>             Fix For: Impala 5.0.0
>
>
> In a case reported by user, the catalogd initialization failed to complete. 
> Log messages showed that catalog HA was enabled. catalogd was blocked when 
> trying to acquire "CatalogServer.catalog_lock_" when calling 
> CatalogServer::UpdateActiveCatalogd() during statestore subscriber 
> registration.
> Log message showed that there was IM command issued before catalogd tried to 
> register to statestore.
> {code:java}
> I0310 12:21:34.093617     1 CatalogServiceCatalog.java:2188] Invalidated all 
> metadata.
> I0310 12:21:34.094341     1 thrift-server.cc:419] ThriftServer 
> 'StatestoreSubscriber' started on port: 23020
> I0310 12:21:34.094341  1816 TAcceptQueueServer.cpp:329] 
> connection_setup_thread_pool_size is set to 2
> I0310 12:21:34.094586     1 thrift-util.cc:198] TSocket::open() error on 
> socket (after THRIFT_POLL) <Host: localhost Port: 23020>: Connection refused
> I0310 12:21:34.094790     1 statestore-subscriber.cc:745] Starting statestore 
> subscriber
> {code}
> We should not allow any metadata operation until initialization is done. When 
> HA is enabled, catalog-server should not hold "CatalogServer.catalog_lock_" 
> for long time before active catalogd is assigned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-13850) Catalogd should not start metadata operation until initialization is done if HA is enabled

Reply via email to