Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22559 )

Change subject: IMPALA-11402: Add limit on files fetched by a single 
getPartialCatalogObject request
......................................................................


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/22559/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22559/14//COMMIT_MSG@12
PS14, Line 12: 
             : This patch adds a new flag, catalog_partial_fetch_max_files, to 
define
             : the max number of file descriptors allowed in a response of
             : getPartialCatalogObject. Catalogd will truncate the response in
             : partition level when it's too big, and only return a subset of 
the
             : requested partitions. Coordinator should send new requests to 
fetch the
             : remaining partitions.
Is this getPartialCatalogObject RPC only exist in local catalog mode, or used 
in legacy mode as well? Please clarify in commit message and help string of 
catalog_partial_fetch_max_files flag.


http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2350
PS14, Line 2350:         LOG.error("Too many files in table {}{}: {}. Current 
limit is {} " +
               :                 "configured by 
catalog_partial_fetch_max_files.", full_name_,
               :             isPartitioned() ? " partition " + 
part.getPartitionName() : "",
               :             numFds, 
BackendConfig.INSTANCE.getCatalogPartialFetchMaxFiles());
What is the recommendation if this error is hit?
How coordinator can recover from this error?


http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1116
PS14, Line 1116:       TGetPartialCatalogObjectRequest nextReq = 
newReqForPartitions(table, remainingIds);
               :       TGetPartialCatalogObjectResponse nextResp = 
sendRequest(nextReq);
What happen if catalog version of this table changed between back-to-back 
requests?


http://gerrit.cloudera.org:8080/#/c/22559/14/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1188
PS14, Line 1188:       throw new CatalogException(String.format("Table %s.%s%s 
has too " +
               :               "many files. Try increasing 
catalog_partial_fetch_max_files to a higher " +
               :               "value and restart catalogd. Check current limit 
in catalogd logs.",
               :           table.dbName_, table.tableName_,
               :           table.isPartitioned() ? " partition " + 
failedPart.get().getName() : ""));
In exception message, say how many files the table actually have.

It is strange that this exception is raised from inferring that 
resp.table_info.partitions.isEmpty(). Can CatalogD propagate an exception 
through resp instead? Or craft special field in resp to describe the issue?


http://gerrit.cloudera.org:8080/#/c/22559/14/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/22559/14/tests/custom_cluster/test_local_catalog.py@675
PS14, Line 675:   @classmethod
              :   def get_workload(self):
              :     return 'tpcds'
Can this be dropped?

The tests use fully-qualified table names everywhere, and also use Dbs other 
than tpcds*



--
To view, visit http://gerrit.cloudera.org:8080/22559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Gerrit-Change-Number: 22559
Gerrit-PatchSet: 14
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Apr 2025 16:45:05 +0000
Gerrit-HasComments: Yes

Reply via email to