[
https://issues.apache.org/jira/browse/IMPALA-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018011#comment-18018011
]
ASF subversion and git services commented on IMPALA-14330:
----------------------------------------------------------
Commit 6f3deabb9d0c0ca98956316bbcc31e14a3363804 in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6f3deabb9 ]
IMPALA-14330: set a valid createEventId in global INVALIDATE METADATA
In global INVALIDATE METADATA (catalog reset), catalogd creates
IncompleteTable for all the known table names. However, the
createEventId is uninitialized so remain as -1. Tables could be dropped
unintentionally by stale DropTable or AlterTableRename events.
Ideally when catalogd creates an IncompleteTable during reset(), it
should fetch the latest event on that table and use its event id as the
createEventId. However, fetching such event ids for all tables is
impractical to finish in a reasonable time. It also adds a significant
load on HMS.
As a compromise, this patch uses the current event id when the reset()
operation starts, and sets it to all IncompleteTable objects created in
this reset operation. This is enough to handle self CreateTable /
DropTable / AlterTableRename events since such self-events generated
before that id will be skipped. Such self-events generated after that id
are triggered by concurrent DDLs which will wait until the corresponding
table list is updated in reset(). The DDL will also update createEventId
to skip stale DropTable / AlterTableRename events.
Concurrent CreateTable DDLs could set a stale createEventId if their HMS
operation finish before reset() and their catalog operations finish
after reset() creates the table. To address this, we add a check in
setCreateEventId() to skip stale event ids.
The current event id of reset() is also used in DeleteEventLog to track
tables removed by this operation.
Refactored IncompleteTable.createUninitializedTable() to force passing a
createEventId as a parameter.
To ease debugging, adds logs when a table is added/removed in HMS events
processing. Also adds logs when the catalog version of a table changes
and adds logs when start processing a rename event.
This patch also refactors CatalogOpExecutor.alterTableOrViewRename() by
extracting some codes into methods. A race issue is identified and fixed
that DeleteEventLog should be updated before renameTable() updates the
catalog cache so the removed old table won't be added back by
concurrently processing of a stale CREATE_TABLE event.
_run_ddls_with_invalidation in test_concurrent_ddls.py could still fail
with timeout when running with sync_ddl=true. The reason is when the DDL
hits IMPALA-9135 and hangs, it needs catalogd to send new catalog
updates to reach the max waiting attempts (see waitForSyncDdlVersion()).
However, if all other concurrent threads already finish, there won't be
any new catalog updates so the DDL will wait forever and finally result
in the test timed out. To workaround this, this patch adds another
concurrent thread that keeps creating new tables until the test finish.
Tests:
- Ran the following tests in test_concurrent_ddls.py 10 rounds. Each
round takes 11 mins.
- test_ddls_with_invalidate_metadata
- test_ddls_with_invalidate_metadata_sync_ddl
- test_mixed_catalog_ddls_with_invalidate_metadata
- test_mixed_catalog_ddls_with_invalidate_metadata_sync_ddl
- test_local_catalog_ddls_with_invalidate_metadata
- test_local_catalog_ddls_with_invalidate_metadata_sync_ddl
- test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
Change-Id: I6506821dedf7701cdfa58d14cae5760ee178c4ec
Reviewed-on: http://gerrit.cloudera.org:8080/23346
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Global INVALIDATE METADATA should set a valid createEventId instead of -1 on
> all tables
> ---------------------------------------------------------------------------------------
>
> Key: IMPALA-14330
> URL: https://issues.apache.org/jira/browse/IMPALA-14330
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Attachments:
> catalogd.ip-172-31-59-7.ubuntu.log.INFO.20250819-225701.1846090,
> impalad.ip-172-31-59-7.ubuntu.log.WARNING.20250819-225701.1846122
>
>
> Similar to IMPALA-14307, using -1 as the createEventId leads to the table
> being dropped by stale events, e.g. ALTER_TABLE RENAME events. We should
> avoid doing so generally. Or at least in catalog reset (i.e. global
> INVALIDATE METADATA) to make the tests in TestConcurrentDdls stable.
> One recent failure is
> [https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
> {noformat}
> custom_cluster/test_concurrent_ddls.py:95: in
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
> self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
> custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
> worker[i].get(timeout=100)
> ../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
> in get
> raise self._value
> E AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
> E AnalysisException: Table does not exist:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> E
> E
> E assert <bound method type.is_acceptable_error of <class
> 'test_concurrent_ddls.TestConcurrentDdls'>>('Query
> 524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not
> exist:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
> False)
> E + where <bound method type.is_acceptable_error of <class
> 'test_concurrent_ddls.TestConcurrentDdls'>> =
> <test_concurrent_ddls.TestConcurrentDdls object at
> 0x7ff4623e7d50>.is_acceptable_error{noformat}
> Checking the coordinator logs, it happens arround 22:57:13.984344
> {noformat}
> I20250819 22:57:13.984344 1847783 jni-util.cc:321]
> 524559e2ab1655af:14cdb06300000000]
> org.apache.impala.common.AnalysisException: Table does not exist:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> Checking the catalogd logs, EventProcessor is paused at event id 38012 by a
> global reset. It has fetched a batch of 22 events starting from event id
> 38013.
> {noformat}
> I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160]
> Received 22 events. First event id: 38013.
> I20250819 22:57:13.930258 1848179 JniUtil.java:167]
> ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL
> issued by ubuntu
> ...
> I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973]
> ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced
> event id is 38012{noformat}
> After the reset finishes, EventProecssor is started at event id 38034. But it
> continues to processed the fetched batch. This is something we should fix.
> {noformat}
> I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062]
> ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last
> synced event id was updated from 38012 to 38034{noformat}
> While it processing event 38018, the log indicates the createEventId is -1:
> {noformat}
> I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018
> Table
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> was not added since it already exists in catalog.
> ...
> W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> create event Id: -1 does not match the event id: 38018{noformat}
> This leads to the table being dropped later by event 38027 which we should
> skip:
> {noformat}
> I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
> set to: 38027
> I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for
> table:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
> set from -1 to 38027
> I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027
> EventType: ALTER_TABLE Target:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2.
> Removed table
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> The table is added by reset() which sets createEventId to -1 for all tables.
> We should at least use the latest event id before reset() to avoid such issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]