[
https://issues.apache.org/jira/browse/IMPALA-13607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18045419#comment-18045419
]
Quanlong Huang commented on IMPALA-13607:
-----------------------------------------
custom_cluster/test_metastore_service.py is a test verifying the feature of
providing HMS endpoints from catalogd, i.e. running with
--start_hms_server=true (IMPALA-10612).
For all the DDL/DML requests, catalogd just delegates them to HMS APIs without
reloading related metadata in the cache. For read requests like get_table_req,
catalogd serves them from its cache which could be stale. There is a flag,
invalidate_hms_cache_on_ddls, to decide whether to explicitly invalidate the
table when catalogd delegates a DDL/DML on the table to HMS.
test_cache_valid_on_nontransactional_table_ddls is a test verifying that when
invalidate_hms_cache_on_ddls=false, the cache is not updated so should have
stale metadata.
However, there are HMS events generated from invoking the HMS APIs. Even when
invalidate_hms_cache_on_ddls=false, catalogd can still update its cache when
processing the corresponding HMS events. The test fails when its check is done
after catalogd applies the event (so the cache is up-to-date).
In fact, these events should be tracked as self-events since we don't want
catalogd to process them. But our current mechanism of self-event tracking
requires setting two table properties ("impala.events.catalogServiceId" and
"impala.events.catalogVersion"), which can't be done in all HMS APIs. E.g. the
parameters of truncate_table() are "String dbName, String tblName, List<String>
partNames". We can't pass a Table object to update the table properties. So the
ALTER event generated from truncate_table() will never be detected as
self-events. That's why the current issue is flaky.
Here are the catalogd logs of a failed run:
[^catalogd.impala-ec2-redhat86-m6i-4xlarge-ondemand-1403.vpc.cloudera.com.jenkins.log.INFO.20251115-050206.2588537]
{noformat}
I1115 05:02:14.224659 2589678 CatalogMetastoreServer.java:215] Invoking HMS
API: truncate_table_req
I1115 05:02:14.260154 2589017 MetastoreEventsProcessor.java:1025] Received 5
events. First event id: 62107.
I1115 05:02:14.260505 2589018 MetastoreEventsProcessor.java:1163] Latest event
in HMS: id=62111, time=1763211734. Last synced event: id=62106, time=1763211731.
W1115 05:02:14.260548 2589018 MetastoreEventsProcessor.java:1166] Lag: 3s. 5
events pending to be processed.
I1115 05:02:14.269137 2589017 MetastoreEvents.java:302] Total number of events
received: 5 Total number of events filtered out: 0
I1115 05:02:14.270582 2589017 MetastoreEvents.java:838] EventId: 62107
EventType: ADD_PARTITION Incremented skipped metric to 3 since no partitions
were added.
I1115 05:02:14.270838 2589017 MetastoreEvents.java:838] EventId: 62108
EventType: ADD_PARTITION Incremented skipped metric to 4 since no partitions
were added.
I1115 05:02:14.270965 2589017 MetastoreEvents.java:838] EventId: 62109
EventType: ADD_PARTITION Incremented skipped metric to 5 since no partitions
were added.
I1115 05:02:14.271135 2589017 CatalogOpExecutor.java:5051] Dropping partition
part_col=3 of table
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
since it's create event id 62109 is not higher than eventid 62110
I1115 05:02:14.271164 2589017 CatalogOpExecutor.java:4985] EventId: 62110
Skipping removal of 0/1 partitions since they don't exist or were created later
in table
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal.
I1115 05:02:14.271370 2589017 MetastoreEvents.java:827] EventId: 62110
EventType: DROP_PARTITION 1 partitions dropped from table
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
I1115 05:02:14.272054 2589017 CatalogServiceCatalog.java:1216] Not a self-event
since the given version is -1 and service id is empty
I1115 05:02:14.272619 2589017 HdfsTable.java:2844] Reloading partition
metadata:
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
part_col=2 (ALTER_PARTITION event)
I1115 05:02:14.272655 2589017 MetaStoreUtil.java:191] Fetching 1 partitions
for:
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
using partition batch size: 1000
I1115 05:02:14.272882 2589678 MetastoreServiceHandler.java:3130] Skipping
invalidation of table
test_cache_valid_on_nontransactional_table_ddls_dbvurly.test_cache_valid_on_nontransactional_table_ddls_tblazcal
due to metastore api truncate_table_req because invalidate hms cache of ddl
flag is set to false{noformat}
The stacktrace of the test:
{code:python}
custom_cluster/test_metastore_service.py:465: in
test_cache_valid_on_nontransactional_table_ddls
self.__test_non_transactional_table_cache_helper(db_name, tbl_name, False)
custom_cluster/test_metastore_service.py:596: in
__test_non_transactional_table_cache_helper
assert part.fileMetadata.data is not None
E assert None is not None
E + where None = FileMetadata(data=None, version=1, type=1).data
E + where FileMetadata(data=None, version=1, type=1) =
Partition(writeId=-1, parameters={'totalSize': '0', 'transient_lastDdlTime':
'...vurly.db/test_cache_valid_on_nontransactional_table_ddls_tblazcal/part_col=2')).fileMetadata{code}
It expects catalogd still have the stale file metadata after truncating the
partition. But due to processing the ALTER_PARTITION event catalogd reloads the
partiton and gets to know it's empty now. So the test fails.
> test_cache_valid_on_nontransactional_table_ddls() fails with assertion error
> ----------------------------------------------------------------------------
>
> Key: IMPALA-13607
> URL: https://issues.apache.org/jira/browse/IMPALA-13607
> Project: IMPALA
> Issue Type: Bug
> Reporter: Pranav Yogi Lodha
> Assignee: Quanlong Huang
> Priority: Major
> Labels: broken-build
>
> h2. Error Message
> {noformat}
> assert None is not None + where None = FileMetadata(data=None, version=1,
> type=1).data + where FileMetadata(data=None, version=1, type=1) =
> Partition(writeId=-1, parameters={'totalSize': '2', 'transient_lastDdlTime':
> '...csjtf.db/test_cache_valid_on_nontransactional_table_ddls_tblqchdb/part_col=2')).fileMetadata{noformat}
> h2. Stacktrace
> {noformat}
> custom_cluster/test_metastore_service.py:465: in
> test_cache_valid_on_nontransactional_table_ddls
> self.__test_non_transactional_table_cache_helper(db_name, tbl_name, False)
> custom_cluster/test_metastore_service.py:596: in
> __test_non_transactional_table_cache_helper assert part.fileMetadata.data is
> not None
> E assert None is not None
> E + where None = FileMetadata(data=None, version=1, type=1).data
> E + where FileMetadata(data=None, version=1, type=1) = Partition(writeId=-1,
> parameters={'totalSize': '2', 'transient_lastDdlTime':
> '...csjtf.db/test_cache_valid_on_nontransactional_table_ddls_tblqchdb/part_col=2')).fileMetadata{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]