[
https://issues.apache.org/jira/browse/IMPALA-14307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013638#comment-18013638
]
Quanlong Huang commented on IMPALA-14307:
-----------------------------------------
IMPALA-13989 adds a fallback invalidateTable on the new table if the rename
operation succeeds in HMS but fails to add it in catalog.
{code:java}
if (result.second == null) {
// The rename succeeded in HMS but failed in the catalog cache. The cache
is in an
// inconsistent state, so invalidate the new table to reload it.
newTblDesc = catalog_.invalidateTable(newTableName.toThrift(),
new Reference<>(), new Reference<>(), catalogTimeline);{code}
[https://github.com/apache/impala/blob/7477107ca370ef4b0709855522f8d955e47edbec/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L5845-L5849]
That's why we see logs of "Invalidating table metadata" in the thread doing
AlterTable Rename operation.
invalidateTable() succeeds without errors (though the createEventId_ is
incorrectly set to -1 which violates IMPALA-12712) but the table is invisiable
to the catalog update collector thread. Instead, it shows up in deleteLog_ so
catalog update collector thread collects as a deletion.
{code:java}
I20250811 16:22:58.250487 3608258 catalog-server.cc:1615] Collected deletion:
1:TABLE:test_ddls_with_invalidate_metadata_9525e717.test_6, version=8523,
original size=97, compressed size=95{code}
One possibility is the table is added into the old Db object. In
CatalogServiceCatalog#getDb(), we just wait for the initial reset to finish:
{code:java}
public Db getDb(String dbName) {
waitInitialResetCompletion(dbName);
return super.getDb(dbName);
}
{code}
waitInitialResetCompletion() won't wait if triggeredInitialReset_ is true. It's
set to true after the initial reset of the startup finishes. So in global
INVALIDATE METADATA triggered later, this method always return directly.
{code:java}
private void waitInitialResetCompletion(@Nullable String dbName) {
boolean isWaiting = false;
while (!triggeredInitialReset_) {
...
}
if (isWaiting) {
LOG.info("Initial reset completed{}.", (dbName != null ? " for db " +
dbName : ""));
}
}{code}
CatalogServiceCatalog#getDb() is used in invalidateTable() to get the Db
object. Without this waiting mechanism, it could get the old Db object which
will be replaced with a new one in the concurrent reset(). Then the table is
added to the old Db object and added into the deleteLog.
{code:java}
// Identify any deleted tables and add them to the delta log
Set<String> oldTableNames =
Sets.newHashSet(existingDb.getAllTableNames());
Set<String> newTableNames = Sets.newHashSet(newDb.getAllTableNames());
oldTableNames.removeAll(newTableNames);
for (String removedTableName: oldTableNames) {
Table removedTable =
IncompleteTable.createUninitializedTableForRemove(
existingDb, removedTableName);
removedTable.setCatalogVersion(incrementAndGetCatalogVersion());
deleteLog_.addRemovedObject(removedTable.toTCatalogObject());
}{code}
[https://github.com/apache/impala/blob/7477107ca370ef4b0709855522f8d955e47edbec/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2356-L2365]
Note that the concurrent reset() loads the table list before the rename
operation finishes in HMS. So the new table name is missing in the list. Thus
missing in the new Db object.
I think we need to set triggeredInitialReset_ to false for all reset()
operations and probably rename it to something like resetInflight_.
CC [~rizaon] [~MikaelSmith]
> invalidateTable could add table to old Db object while concurrent reset is
> running
> ----------------------------------------------------------------------------------
>
> Key: IMPALA-14307
> URL: https://issues.apache.org/jira/browse/IMPALA-14307
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> Saw a failure in TestConcurrentDdls.test_ddls_with_invalidate_metadata:
> {code:python}
> custom_cluster/test_concurrent_ddls.py:50: in
> test_ddls_with_invalidate_metadata
> self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
> custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
> worker[i].get(timeout=100)
> /data/jenkins/workspace/impala-asf-master-core-ubsan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
> in get
> raise self._value
> E AssertionError: Query b148f272b5821772:ace4369900000000 failed:
> E AnalysisException: Table does not exist:
> test_ddls_with_invalidate_metadata_9525e717.test_6
> E
> E
> E assert <bound method type.is_acceptable_error of <class
> 'test_concurrent_ddls.TestConcurrentDdls'>>('Query
> b148f272b5821772:ace4369900000000 failed:\nAnalysisException: Table does not
> exist: test_ddls_with_invalidate_metadata_9525e717.test_6\n\n', False)
> E + where <bound method type.is_acceptable_error of <class
> 'test_concurrent_ddls.TestConcurrentDdls'>> =
> <test_concurrent_ddls.TestConcurrentDdls object at
> 0xffff1ff7d810>.is_acceptable_error{code}
> The test runs in legacy catalog mode. Coordinator logs show that it receives
> a deletion update on the table before analying the REFRESH statement on it:
> {noformat}
> I20250811 16:22:58.429529 3608823 ImpaladCatalog.java:227] Deleting:
> TABLE:test_ddls_with_invalidate_metadata_9525e717.test_6 version: 8523 size:
> 97
> ...
> I20250811 16:22:58.540377 3609121 Frontend.java:2398]
> b148f272b5821772:ace4369900000000] Analyzing query: refresh
> test_ddls_with_invalidate_metadata_9525e717.test_6 db: default
> ...
> I20250811 16:22:58.542122 3609121 jni-util.cc:321]
> b148f272b5821772:ace4369900000000]
> org.apache.impala.common.AnalysisException: Table does not exist:
> test_ddls_with_invalidate_metadata_9525e717.test_6
> at
> org.apache.impala.analysis.ResetMetadataStmt.analyze(ResetMetadataStmt.java:173)
> at
> org.apache.impala.analysis.AnalysisContext$AnalysisDriverImpl.analyze(AnalysisContext.java:576)
> at
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:496)
> at
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2938)
> at
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2505)
> at
> org.apache.impala.service.Frontend.getTExecRequestWithFallback(Frontend.java:2374)
> at
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2063)
> at
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:176){noformat}
> Catalogd logs show that invalidateTable triggered by alterTableOrViewRename()
> is running concurrently with a catalog reset (i.e. global INVALIDATE
> METADATA):
> {noformat}
> I20250811 16:22:58.129398 3609170 JniUtil.java:167]
> f548d67b72570890:626780e200000000] execDdl request: ALTER_TABLE
> test_ddls_with_invalidate_metadata_9525e717.test_6_2 RENAME_TABLE issued by
> jenkins
> I20250811 16:22:58.144670 3608935 JniUtil.java:167]
> 1d4918bff853c0ec:c5646e4a00000000] resetMetadata request: INVALIDATE ALL
> issued by jenkins
> I20250811 16:22:58.173722 3608935 CatalogServiceCatalog.java:2339]
> 1d4918bff853c0ec:c5646e4a00000000] Loaded table list (12) and functions (0)
> for database: test_ddls_with_invalidate_metadata_9525e717. Fetch duration: 2
> ms.
> I20250811 16:22:58.177870 3609170 CatalogServiceCatalog.java:3283]
> f548d67b72570890:626780e200000000] Invalidating table metadata:
> test_ddls_with_invalidate_metadata_9525e717.test_6
> I20250811 16:22:58.178402 3608935 JniUtil.java:178]
> 1d4918bff853c0ec:c5646e4a00000000] Finished resetMetadata request: INVALIDATE
> ALL issued by jenkins. Time spent: 34ms
> I20250811 16:22:58.181602 3609170 Table.java:246]
> f548d67b72570890:626780e200000000] createEventId_ for table:
> test_ddls_with_invalidate_metadata_9525e717.test_6 set to: -1
> I20250811 16:22:58.181922 3609170 JniUtil.java:178]
> f548d67b72570890:626780e200000000] Finished execDdl request: ALTER_TABLE
> test_ddls_with_invalidate_metadata_9525e717.test_6_2 RENAME_TABLE issued by
> jenkins. Time spent: 52ms
> {noformat}
> Then the catalog update collector thread saw the table disappears and sent
> the deletion update:
> {noformat}
> I20250811 16:22:58.247798 3608258 JniUtil.java:167] getCatalogDelta request:
> Getting catalog delta from version 8517
> I20250811 16:22:58.250487 3608258 catalog-server.cc:1615] Collected deletion:
> 1:TABLE:test_ddls_with_invalidate_metadata_9525e717.test_6, version=8523,
> original size=97, compressed size=95
> I20250811 16:22:58.250890 3608258 catalog-server.cc:1615] Collected update:
> 1:CATALOG_SERVICE_ID, version=8532, original size=60, compressed size=58
> I20250811 16:22:58.250959 3608258 JniUtil.java:178] Finished getCatalogDelta
> request: Getting catalog delta from version 8517. Time spent: 3ms{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]