Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22559 )

Change subject: IMPALA-11402: Add limit on files fetched by a single 
getPartialCatalogObject request
......................................................................


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22559/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22559/14//COMMIT_MSG@12
PS14, Line 12: OOM of exceeding the JVM array limit when serializing the 
response of
             : a getPartialCatalogObject request for all partitions (thus all 
files).
             :
             : This patch adds a new flag, catalog_partial_fetch_max_files, to 
define
             : the max number of file descriptors allowed in a response of
             : getPartialCatalogObject. Catalogd will truncate the response in
             : partition level when
> Is this getPartialCatalogObject RPC only exist in local catalog mode, or us
Yeah, it's only used in local catalog mode.


http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2350
PS14, Line 2350:     if (numFilesCollected + numFds >
               :         
BackendConfig.INSTANCE.getCatalogPartialFetchMaxFiles()) {
               :       if (numFilesCollected == 0) {
               :         // Even collecting the first partition will exceed the 
limit which me
> What is the recommendation if this error is hit?
User should compact the files to reduce the number. Coordinator will fail the 
query. The behavior is tested in test_too_many_files.


http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1116
PS14, Line 1116:     }
               :     if (logProgress) {
> What happen if catalog version of this table changed between back-to-back r
This is handled in sendRequest(). It will throw an 
InconsistentMetadataFetchException:
https://github.com/apache/impala/blob/50a98dce46fd337461f67b98979fece95c7bf738/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L501-L510

Coordinator will replan the query at
https://github.com/apache/impala/blob/50a98dce46fd337461f67b98979fece95c7bf738/fe/src/main/java/org/apache/impala/service/Frontend.java#L2473


http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1188
PS14, Line 1188:                 resp.table_info.iceberg_table != null, req,
               :                 "missing Iceberg table metadata");
               :             return resp.getTable_info();
               :           }
               :     });
> In exception message, say how many files the table actually have.
Yeah, catalogd can return a non-OK TStatus in the response but we don't use it 
well yet:
https://github.com/apache/impala/blob/50a98dce46fd337461f67b98979fece95c7bf738/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L469-L471

Removed this method and use the 'status' field of 
TGetPartialCatalogObjectResponse instead.


http://gerrit.cloudera.org:8080/#/c/22559/14/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/22559/14/tests/custom_cluster/test_local_catalog.py@675
PS14, Line 675:   @pytest.mark.execute_serially
              :   @CustomClusterTestSuite.with_args(
              :     impalad_args="
> Can this be dropped?
Done



--
To view, visit http://gerrit.cloudera.org:8080/22559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Gerrit-Change-Number: 22559
Gerrit-PatchSet: 17
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Apr 2025 08:55:08 +0000
Gerrit-HasComments: Yes

Reply via email to