[
https://issues.apache.org/jira/browse/IMPALA-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-14330:
------------------------------------
Description:
Similar to IMPALA-14307, using -1 as the createEventId leads to the table being
dropped by stale events, e.g. ALTER_TABLE RENAME events. We should avoid doing
so generally. Or at least in catalog reset (i.e. global INVALIDATE METADATA) to
make the tests in TestConcurrentDdls stable.
One recent failure is
[https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
{noformat}
custom_cluster/test_concurrent_ddls.py:95: in
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
worker[i].get(timeout=100)
../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
in get
raise self._value
E AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
E AnalysisException: Table does not exist:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
E
E
E assert <bound method type.is_acceptable_error of <class
'test_concurrent_ddls.TestConcurrentDdls'>>('Query
524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not
exist:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
False)
E + where <bound method type.is_acceptable_error of <class
'test_concurrent_ddls.TestConcurrentDdls'>> =
<test_concurrent_ddls.TestConcurrentDdls object at
0x7ff4623e7d50>.is_acceptable_error{noformat}
Checking the coordinator logs, it happens arround 22:57:13.984344
{noformat}
I20250819 22:57:13.984344 1847783 jni-util.cc:321]
524559e2ab1655af:14cdb06300000000] org.apache.impala.common.AnalysisException:
Table does not exist:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
Checking the catalogd logs, EventProcessor is paused at event id 38012 by a
global reset. It has fetched a batch of 22 events starting from event id 38013.
{noformat}
I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160] Received
22 events. First event id: 38013.
I20250819 22:57:13.930258 1848179 JniUtil.java:167]
ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL issued
by ubuntu
...
I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973]
ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced
event id is 38012{noformat}
After the reset finishes, EventProecssor is started at event id 38034. But it
continues to processed the fetched batch. This is something we should fix.
{noformat}
I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062]
ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last
synced event id was updated from 38012 to 38034{noformat}
While it processing event 38018, the log indicates the createEventId is -1:
{noformat}
I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018
Table
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
was not added since it already exists in catalog.
...
W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
create event Id: -1 does not match the event id: 38018{noformat}
This leads to the table being dropped later by event 38027 which we should skip:
{noformat}
I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
set to: 38027
I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for table:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
set from -1 to 38027
I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027
EventType: ALTER_TABLE Target:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2.
Removed table
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
The table is added by reset() which sets createEventId to -1 for all tables. We
should at least use the latest event id before reset() to avoid such issue.
was:
Similar to IMPALA-14307, using -1 as the createEventId leads to the table being
dropped by stale events, e.g. ALTER_TABLE RENAME events. We should avoid doing
so generally. Or at least in catalog reset (i.e. global INVALIDATE METADATA) to
make the tests in TestConcurrentDdls stable.
One recent failure is
[https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
{noformat}
custom_cluster/test_concurrent_ddls.py:95: in
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
worker[i].get(timeout=100)
../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
in get
raise self._value
E AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
E AnalysisException: Table does not exist:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
E
E
E assert <bound method type.is_acceptable_error of <class
'test_concurrent_ddls.TestConcurrentDdls'>>('Query
524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not
exist:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
False)
E + where <bound method type.is_acceptable_error of <class
'test_concurrent_ddls.TestConcurrentDdls'>> =
<test_concurrent_ddls.TestConcurrentDdls object at
0x7ff4623e7d50>.is_acceptable_error{noformat}
Checking the coordinator logs, it happens arround 22:57:13.984344
{noformat}
I20250819 22:57:13.984344 1847783 jni-util.cc:321]
524559e2ab1655af:14cdb06300000000] org.apache.impala.common.AnalysisException:
Table does not exist:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
Checking the catalogd logs, EventProcessor is paused at event id 38012 by a
global reset. It has fetched a batch of 22 events starting from event id 38013.
{noformat}
I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160] Received
22 events. First event id: 38013.
I20250819 22:57:13.930258 1848179 JniUtil.java:167]
ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL issued
by ubuntu
...
I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973]
ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced
event id is 38012{noformat}
After the reset finishes, EventProecssor is started at event id 38034. But it
continues to processed the fetched batch. This is something we should fix.
{noformat}
I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062]
ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last
synced event id was updated from 38012 to 38034{noformat}
While it processing event 38018, the log indicates the createEventId is -1:
{noformat}
I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018
Table
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
was not added since it already exists in catalog.
...
W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
create event Id: -1 does not match the event id: 38018{noformat}
This leads to the table being dropped later by event 38027 which we should skip:
{noformat}
I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
set to: 38027
I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for table:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
set from -1 to 38027
I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027
EventType: ALTER_TABLE Target:
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2.
Removed table
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> Global INVALIDATE METADATA should set a valid createEventId instead of -1 on
> all tables
> ---------------------------------------------------------------------------------------
>
> Key: IMPALA-14330
> URL: https://issues.apache.org/jira/browse/IMPALA-14330
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Attachments:
> catalogd.ip-172-31-59-7.ubuntu.log.INFO.20250819-225701.1846090,
> impalad.ip-172-31-59-7.ubuntu.log.WARNING.20250819-225701.1846122
>
>
> Similar to IMPALA-14307, using -1 as the createEventId leads to the table
> being dropped by stale events, e.g. ALTER_TABLE RENAME events. We should
> avoid doing so generally. Or at least in catalog reset (i.e. global
> INVALIDATE METADATA) to make the tests in TestConcurrentDdls stable.
> One recent failure is
> [https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
> {noformat}
> custom_cluster/test_concurrent_ddls.py:95: in
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
> self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
> custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
> worker[i].get(timeout=100)
> ../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
> in get
> raise self._value
> E AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
> E AnalysisException: Table does not exist:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> E
> E
> E assert <bound method type.is_acceptable_error of <class
> 'test_concurrent_ddls.TestConcurrentDdls'>>('Query
> 524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not
> exist:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
> False)
> E + where <bound method type.is_acceptable_error of <class
> 'test_concurrent_ddls.TestConcurrentDdls'>> =
> <test_concurrent_ddls.TestConcurrentDdls object at
> 0x7ff4623e7d50>.is_acceptable_error{noformat}
> Checking the coordinator logs, it happens arround 22:57:13.984344
> {noformat}
> I20250819 22:57:13.984344 1847783 jni-util.cc:321]
> 524559e2ab1655af:14cdb06300000000]
> org.apache.impala.common.AnalysisException: Table does not exist:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> Checking the catalogd logs, EventProcessor is paused at event id 38012 by a
> global reset. It has fetched a batch of 22 events starting from event id
> 38013.
> {noformat}
> I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160]
> Received 22 events. First event id: 38013.
> I20250819 22:57:13.930258 1848179 JniUtil.java:167]
> ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL
> issued by ubuntu
> ...
> I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973]
> ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced
> event id is 38012{noformat}
> After the reset finishes, EventProecssor is started at event id 38034. But it
> continues to processed the fetched batch. This is something we should fix.
> {noformat}
> I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062]
> ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last
> synced event id was updated from 38012 to 38034{noformat}
> While it processing event 38018, the log indicates the createEventId is -1:
> {noformat}
> I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018
> Table
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> was not added since it already exists in catalog.
> ...
> W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> create event Id: -1 does not match the event id: 38018{noformat}
> This leads to the table being dropped later by event 38027 which we should
> skip:
> {noformat}
> I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
> set to: 38027
> I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for
> table:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
> set from -1 to 38027
> I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027
> EventType: ALTER_TABLE Target:
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2.
> Removed table
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> The table is added by reset() which sets createEventId to -1 for all tables.
> We should at least use the latest event id before reset() to avoid such issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]