Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/22559 )
Change subject: IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject request ...................................................................... Patch Set 17: (5 comments) http://gerrit.cloudera.org:8080/#/c/22559/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22559/14//COMMIT_MSG@12 PS14, Line 12: OOM of exceeding the JVM array limit when serializing the response of : a getPartialCatalogObject request for all partitions (thus all files). : : This patch adds a new flag, catalog_partial_fetch_max_files, to define : the max number of file descriptors allowed in a response of : getPartialCatalogObject. Catalogd will truncate the response in : partition level when > Is this getPartialCatalogObject RPC only exist in local catalog mode, or us Yeah, it's only used in local catalog mode. http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2350 PS14, Line 2350: if (numFilesCollected + numFds > : BackendConfig.INSTANCE.getCatalogPartialFetchMaxFiles()) { : if (numFilesCollected == 0) { : // Even collecting the first partition will exceed the limit which me > What is the recommendation if this error is hit? User should compact the files to reduce the number. Coordinator will fail the query. The behavior is tested in test_too_many_files. http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1116 PS14, Line 1116: } : if (logProgress) { > What happen if catalog version of this table changed between back-to-back r This is handled in sendRequest(). It will throw an InconsistentMetadataFetchException: https://github.com/apache/impala/blob/50a98dce46fd337461f67b98979fece95c7bf738/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L501-L510 Coordinator will replan the query at https://github.com/apache/impala/blob/50a98dce46fd337461f67b98979fece95c7bf738/fe/src/main/java/org/apache/impala/service/Frontend.java#L2473 http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1188 PS14, Line 1188: resp.table_info.iceberg_table != null, req, : "missing Iceberg table metadata"); : return resp.getTable_info(); : } : }); > In exception message, say how many files the table actually have. Yeah, catalogd can return a non-OK TStatus in the response but we don't use it well yet: https://github.com/apache/impala/blob/50a98dce46fd337461f67b98979fece95c7bf738/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L469-L471 Removed this method and use the 'status' field of TGetPartialCatalogObjectResponse instead. http://gerrit.cloudera.org:8080/#/c/22559/14/tests/custom_cluster/test_local_catalog.py File tests/custom_cluster/test_local_catalog.py: http://gerrit.cloudera.org:8080/#/c/22559/14/tests/custom_cluster/test_local_catalog.py@675 PS14, Line 675: @pytest.mark.execute_serially : @CustomClusterTestSuite.with_args( : impalad_args=" > Can this be dropped? Done -- To view, visit http://gerrit.cloudera.org:8080/22559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5 Gerrit-Change-Number: 22559 Gerrit-PatchSet: 17 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Comment-Date: Tue, 15 Apr 2025 08:55:08 +0000 Gerrit-HasComments: Yes