Mihaly Szjatinya has uploaded this change for review. ( http://gerrit.cloudera.org:8080/24041
Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables ...................................................................... IMPALA-14583: Support partial RPC dispatch for Iceberg tables This patch extends IMPALA-11402 to support partial RPC dispatch for Iceberg tables in local catalog mode. IMPALA-11402 added support for HDFS partitioned tables where catalogd can truncate the response of getPartialCatalogObject at partition boundaries when the file count exceeds catalog_partial_fetch_max_files. For Iceberg tables, the file list is not organized by partitions but stored as a flat list of data and delete files. This patch implements offset-based pagination to allow catalogd to truncate the response at any point in the file list, not just at partition boundaries. Implementation details: - Added iceberg_file_offset field to TTableInfoSelector thrift struct - IcebergContentFileStore.toThriftPartial() supports pagination with offset and limit parameters - IcebergContentFileStore uses a reverse lookup table (icebergFileOffsetToContentFile_) for efficient offset-based access to files - IcebergTable.getPartialInfo() enforces the file limit configured by catalog_partial_fetch_max_files (reusing the flag from IMPALA-11402) - CatalogdMetaProvider.loadIcebergTableWithRetry() implements the retry loop on the coordinator side, sending follow-up requests with incremented offsets until all files are fetched - Coordinator detects catalog version changes between requests and throws InconsistentMetadataFetchException for query replanning Key differences from IMPALA-11402: - Offset-based pagination instead of partition-based (can split anywhere) - Single flat file list instead of per-partition file lists - Works with both data files and delete files (Iceberg v2) Tests: - Added two custom-cluster tests in TestAllowIncompleteData: * test_incomplete_iceberg_file_list: 150 data files with limit=100 * test_iceberg_with_delete_files: 60+ data+delete files with limit=50 - Both tests verify partial fetch across multiple requests and proper log messages for truncation warnings and request counts Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb --- M be/src/catalog/catalog-server.h M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M tests/custom_cluster/test_local_catalog.py 7 files changed, 372 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/24041/1 -- To view, visit http://gerrit.cloudera.org:8080/24041 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb Gerrit-Change-Number: 24041 Gerrit-PatchSet: 1 Gerrit-Owner: Mihaly Szjatinya <[email protected]>
