Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22559 )

Change subject: IMPALA-11402: Add limit on files fetched by a single 
getPartialCatalogObject request
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22559/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22559/1//COMMIT_MSG@28
PS1, Line 28: Choose 1000000 as the default value for now. We can tune it in the
> That's a pretty significant difference. If that is the overall time, we sho
Sorry that this is not the overall time but just the duration of a single RPC. 
E.g. when setting the limit as 1M, for a table of 6M files and partitions, it 
requires 6 round-trips. Each takes 1s487ms in catalogd side. The total time in 
catalogd side is 1s487ms * 6 = 8s922ms. So tranlating to the overall time in 
catalogd side, the limit and the corresponding time:
* 1M: 8s922ms
* 2M: 12s105ms
* 3M: 13s286ms

Setting a smaller limit requires more requests and holding the table lock more 
times. If the table changed (by DDL/REFRESH) between them, coordinator has to 
resend some requests.

I'll update the limit to 1M for now and tune it later. Since it's configurable, 
we don't need to wait for the tests too long here.


http://gerrit.cloudera.org:8080/#/c/22559/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/22559/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1105
PS1, Line 1105:           ids.size() - numFetchedParts);
> Can this be bounded by backendCfg_.catalog_partial_fetch_max_files to not p
These are partition ids. Before sending the request, coordinator don't know how 
many files in them. Some partitions might be empty. Some might have lots of 
files. Catalogd truncates the response based on the actually collected files.

On the other side, these ids are sent in a numeric list, the request size is 
not that large comparing to the response size.



--
To view, visit http://gerrit.cloudera.org:8080/22559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Gerrit-Change-Number: 22559
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Mar 2025 08:18:04 +0000
Gerrit-HasComments: Yes

Reply via email to