[ 
https://issues.apache.org/jira/browse/IMPALA-13607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045419#comment-18045419
 ] 

Quanlong Huang commented on IMPALA-13607:
-----------------------------------------

custom_cluster/test_metastore_service.py is a test verifying the feature of 
providing HMS endpoints from catalogd, i.e. running with 
--start_hms_server=true (IMPALA-10612).
For all the DDL/DML requests, catalogd just delegates them to HMS APIs without 
reloading related metadata in the cache. For read requests like get_table_req, 
catalogd serves them from its cache which could be stale. There is a flag, 
invalidate_hms_cache_on_ddls, to decide whether to explicitly invalidate the 
table when catalogd delegates a DDL/DML on the table to HMS. 
test_cache_valid_on_nontransactional_table_ddls is a test verifying that when 
invalidate_hms_cache_on_ddls=false, the cache is not updated so should have 
stale metadata.

However, there are HMS events generated from invoking the HMS APIs. Even when 
invalidate_hms_cache_on_ddls=false, catalogd can still update its cache when 
processing the corresponding HMS events. The test fails when its check is done 
after catalogd applies the event (so the cache is up-to-date).

In fact, these events should be tracked as self-events since we don't want 
catalogd to process them. But our current mechanism of self-event tracking 
requires setting two table properties ("impala.events.catalogServiceId" and 
"impala.events.catalogVersion"), which can't be done in all HMS APIs. E.g. the 
parameters of truncate_table() are "String dbName, String tblName, List<String> 
partNames". We can't pass a Table object to update the table properties. So the 
ALTER event generated from truncate_table() will never be detected as 
self-events. That's why the current issue is flaky.

Here are the catalogd logs of a failed run: 
[^catalogd.impala-ec2-redhat86-m6i-4xlarge-ondemand-1403.vpc.cloudera.com.jenkins.log.INFO.20251115-050206.2588537]
{noformat}
I1115 05:02:14.224659 2589678 CatalogMetastoreServer.java:215] Invoking HMS 
API: truncate_table_req
I1115 05:02:14.260154 2589017 MetastoreEventsProcessor.java:1025] Received 5 
events. First event id: 62107.
I1115 05:02:14.260505 2589018 MetastoreEventsProcessor.java:1163] Latest event 
in HMS: id=62111, time=1763211734. Last synced event: id=62106, time=1763211731.
W1115 05:02:14.260548 2589018 MetastoreEventsProcessor.java:1166] Lag: 3s. 5 
events pending to be processed.
I1115 05:02:14.269137 2589017 MetastoreEvents.java:302] Total number of events 
received: 5 Total number of events filtered out: 0
I1115 05:02:14.270582 2589017 MetastoreEvents.java:838] EventId: 62107 
EventType: ADD_PARTITION Incremented skipped metric to 3 since no partitions 
were added.
I1115 05:02:14.270838 2589017 MetastoreEvents.java:838] EventId: 62108 
EventType: ADD_PARTITION Incremented skipped metric to 4 since no partitions 
were added.
I1115 05:02:14.270965 2589017 MetastoreEvents.java:838] EventId: 62109 
EventType: ADD_PARTITION Incremented skipped metric to 5 since no partitions 
were added.
I1115 05:02:14.271135 2589017 CatalogOpExecutor.java:5051] Dropping partition 
part_col=3 of table 
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
 since it's create event id 62109 is not higher than eventid 62110
I1115 05:02:14.271164 2589017 CatalogOpExecutor.java:4985] EventId: 62110 
Skipping removal of 0/1 partitions since they don't exist or were created later 
in table 
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal.
I1115 05:02:14.271370 2589017 MetastoreEvents.java:827] EventId: 62110 
EventType: DROP_PARTITION 1 partitions dropped from table 
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
I1115 05:02:14.272054 2589017 CatalogServiceCatalog.java:1216] Not a self-event 
since the given version is -1 and service id is empty
I1115 05:02:14.272619 2589017 HdfsTable.java:2844] Reloading partition 
metadata: 
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
 part_col=2 (ALTER_PARTITION event)
I1115 05:02:14.272655 2589017 MetaStoreUtil.java:191] Fetching 1 partitions 
for: 
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
 using partition batch size: 1000
I1115 05:02:14.272882 2589678 MetastoreServiceHandler.java:3130] Skipping 
invalidation of table 
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
 due to metastore api truncate_table_req because invalidate hms cache of ddl 
flag is set to false{noformat}
The stacktrace of the test:
{code:python}
custom_cluster/test_metastore_service.py:465: in 
test_cache_valid_on_nontransactional_table_ddls
    self.__test_non_transactional_table_cache_helper(db_name, tbl_name, False)
custom_cluster/test_metastore_service.py:596: in 
__test_non_transactional_table_cache_helper
    assert part.fileMetadata.data is not None
E   assert None is not None
E    +  where None = FileMetadata(data=None, version=1, type=1).data
E    +    where FileMetadata(data=None, version=1, type=1) = 
Partition(writeId=-1, parameters={'totalSize': '0', 'transient_lastDdlTime': 
'...vurly.db/test_cache_valid_on_nontransactional_table_ddls_tblazcal/part_col=2')).fileMetadata{code}
It expects catalogd still have the stale file metadata after truncating the 
partition. But due to processing the ALTER_PARTITION event catalogd reloads the 
partiton and gets to know it's empty now. So the test fails.

> test_cache_valid_on_nontransactional_table_ddls() fails with assertion error
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-13607
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13607
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Pranav Yogi Lodha
>            Assignee: Quanlong Huang
>            Priority: Major
>              Labels: broken-build
>
> h2. Error Message
> {noformat}
> assert None is not None + where None = FileMetadata(data=None, version=1, 
> type=1).data + where FileMetadata(data=None, version=1, type=1) = 
> Partition(writeId=-1, parameters={'totalSize': '2', 'transient_lastDdlTime': 
> '...csjtf.db/test_cache_valid_on_nontransactional_table_ddls_tblqchdb/part_col=2')).fileMetadata{noformat}
> h2. Stacktrace
> {noformat}
> custom_cluster/test_metastore_service.py:465: in 
> test_cache_valid_on_nontransactional_table_ddls 
> self.__test_non_transactional_table_cache_helper(db_name, tbl_name, False) 
> custom_cluster/test_metastore_service.py:596: in 
> __test_non_transactional_table_cache_helper assert part.fileMetadata.data is 
> not None
> E assert None is not None
> E + where None = FileMetadata(data=None, version=1, type=1).data
> E + where FileMetadata(data=None, version=1, type=1) = Partition(writeId=-1, 
> parameters={'totalSize': '2', 'transient_lastDdlTime': 
> '...csjtf.db/test_cache_valid_on_nontransactional_table_ddls_tblqchdb/part_col=2')).fileMetadata{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to