[ 
https://issues.apache.org/jira/browse/IMPALA-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-14330:
------------------------------------
    Description: 
Similar to IMPALA-14307, using -1 as the createEventId leads to the table being 
dropped by stale events, e.g. ALTER_TABLE RENAME events. We should avoid doing 
so generally. Or at least in catalog reset (i.e. global INVALIDATE METADATA) to 
make the tests in TestConcurrentDdls stable.

One recent failure is 
[https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
{noformat}
custom_cluster/test_concurrent_ddls.py:95: in 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
    self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
    worker[i].get(timeout=100)
../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
 in get
    raise self._value
E   AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
E     AnalysisException: Table does not exist: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
E     
E     
E   assert <bound method type.is_acceptable_error of <class 
'test_concurrent_ddls.TestConcurrentDdls'>>('Query 
524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not 
exist: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
 False)
E    +  where <bound method type.is_acceptable_error of <class 
'test_concurrent_ddls.TestConcurrentDdls'>> = 
<test_concurrent_ddls.TestConcurrentDdls object at 
0x7ff4623e7d50>.is_acceptable_error{noformat}
Checking the coordinator logs, it happens arround 22:57:13.984344
{noformat}
I20250819 22:57:13.984344 1847783 jni-util.cc:321] 
524559e2ab1655af:14cdb06300000000] org.apache.impala.common.AnalysisException: 
Table does not exist: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
Checking the catalogd logs, EventProcessor is paused at event id 38012 by a 
global reset. It has fetched a batch of 22 events starting from event id 38013.
{noformat}
I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160] Received 
22 events. First event id: 38013.
I20250819 22:57:13.930258 1848179 JniUtil.java:167] 
ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL issued 
by ubuntu
...
I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973] 
ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced 
event id is 38012{noformat}
After the reset finishes, EventProecssor is started at event id 38034. But it 
continues to processed the fetched batch. This is something we should fix.
{noformat}
I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062] 
ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last 
synced event id was updated from 38012 to 38034{noformat}
While it processing event 38018, the log indicates the createEventId is -1:
{noformat}
I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018 
Table 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
was not added since it already exists in catalog.
...
W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
create event Id: -1 does not match the event id: 38018{noformat}
This leads to the table being dropped later by event 38027 which we should skip:
{noformat}
I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2 
set to: 38027
I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for table: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2 
set from -1 to 38027
I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027 
EventType: ALTER_TABLE Target: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2. 
Removed table 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
The table is added by reset() which sets createEventId to -1 for all tables. We 
should at least use the latest event id before reset() to avoid such issue.

  was:
Similar to IMPALA-14307, using -1 as the createEventId leads to the table being 
dropped by stale events, e.g. ALTER_TABLE RENAME events. We should avoid doing 
so generally. Or at least in catalog reset (i.e. global INVALIDATE METADATA) to 
make the tests in TestConcurrentDdls stable.

One recent failure is 
[https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
{noformat}
custom_cluster/test_concurrent_ddls.py:95: in 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
    self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
    worker[i].get(timeout=100)
../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
 in get
    raise self._value
E   AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
E     AnalysisException: Table does not exist: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
E     
E     
E   assert <bound method type.is_acceptable_error of <class 
'test_concurrent_ddls.TestConcurrentDdls'>>('Query 
524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not 
exist: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
 False)
E    +  where <bound method type.is_acceptable_error of <class 
'test_concurrent_ddls.TestConcurrentDdls'>> = 
<test_concurrent_ddls.TestConcurrentDdls object at 
0x7ff4623e7d50>.is_acceptable_error{noformat}
Checking the coordinator logs, it happens arround 22:57:13.984344
{noformat}
I20250819 22:57:13.984344 1847783 jni-util.cc:321] 
524559e2ab1655af:14cdb06300000000] org.apache.impala.common.AnalysisException: 
Table does not exist: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
Checking the catalogd logs, EventProcessor is paused at event id 38012 by a 
global reset. It has fetched a batch of 22 events starting from event id 38013.
{noformat}
I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160] Received 
22 events. First event id: 38013.
I20250819 22:57:13.930258 1848179 JniUtil.java:167] 
ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL issued 
by ubuntu
...
I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973] 
ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced 
event id is 38012{noformat}
After the reset finishes, EventProecssor is started at event id 38034. But it 
continues to processed the fetched batch. This is something we should fix.
{noformat}
I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062] 
ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last 
synced event id was updated from 38012 to 38034{noformat}
While it processing event 38018, the log indicates the createEventId is -1:
{noformat}
I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018 
Table 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
was not added since it already exists in catalog.
...
W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
create event Id: -1 does not match the event id: 38018{noformat}
This leads to the table being dropped later by event 38027 which we should skip:
{noformat}
I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2 
set to: 38027
I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for table: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2 
set from -1 to 38027
I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027 
EventType: ALTER_TABLE Target: 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2. 
Removed table 
test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}


> Global INVALIDATE METADATA should set a valid createEventId instead of -1 on 
> all tables
> ---------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14330
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14330
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: 
> catalogd.ip-172-31-59-7.ubuntu.log.INFO.20250819-225701.1846090, 
> impalad.ip-172-31-59-7.ubuntu.log.WARNING.20250819-225701.1846122
>
>
> Similar to IMPALA-14307, using -1 as the createEventId leads to the table 
> being dropped by stale events, e.g. ALTER_TABLE RENAME events. We should 
> avoid doing so generally. Or at least in catalog reset (i.e. global 
> INVALIDATE METADATA) to make the tests in TestConcurrentDdls stable.
> One recent failure is 
> [https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
> {noformat}
> custom_cluster/test_concurrent_ddls.py:95: in 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
>     self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
> custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
>     worker[i].get(timeout=100)
> ../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
>  in get
>     raise self._value
> E   AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
> E     AnalysisException: Table does not exist: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> E     
> E     
> E   assert <bound method type.is_acceptable_error of <class 
> 'test_concurrent_ddls.TestConcurrentDdls'>>('Query 
> 524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not 
> exist: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
>  False)
> E    +  where <bound method type.is_acceptable_error of <class 
> 'test_concurrent_ddls.TestConcurrentDdls'>> = 
> <test_concurrent_ddls.TestConcurrentDdls object at 
> 0x7ff4623e7d50>.is_acceptable_error{noformat}
> Checking the coordinator logs, it happens arround 22:57:13.984344
> {noformat}
> I20250819 22:57:13.984344 1847783 jni-util.cc:321] 
> 524559e2ab1655af:14cdb06300000000] 
> org.apache.impala.common.AnalysisException: Table does not exist: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> Checking the catalogd logs, EventProcessor is paused at event id 38012 by a 
> global reset. It has fetched a batch of 22 events starting from event id 
> 38013.
> {noformat}
> I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160] 
> Received 22 events. First event id: 38013.
> I20250819 22:57:13.930258 1848179 JniUtil.java:167] 
> ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL 
> issued by ubuntu
> ...
> I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973] 
> ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced 
> event id is 38012{noformat}
> After the reset finishes, EventProecssor is started at event id 38034. But it 
> continues to processed the fetched batch. This is something we should fix.
> {noformat}
> I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062] 
> ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last 
> synced event id was updated from 38012 to 38034{noformat}
> While it processing event 38018, the log indicates the createEventId is -1:
> {noformat}
> I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018 
> Table 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
> was not added since it already exists in catalog.
> ...
> W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
> create event Id: -1 does not match the event id: 38018{noformat}
> This leads to the table being dropped later by event 38027 which we should 
> skip:
> {noformat}
> I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
>  set to: 38027
> I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for 
> table: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
>  set from -1 to 38027
> I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027 
> EventType: ALTER_TABLE Target: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2.
>  Removed table 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> The table is added by reset() which sets createEventId to -1 for all tables. 
> We should at least use the latest event id before reset() to avoid such issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to