Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/22559 )
Change subject: IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject request ...................................................................... Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/22559/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22559/1//COMMIT_MSG@28 PS1, Line 28: Choose 1000000 as the default value for now. We can tune it in the > That's a pretty significant difference. If that is the overall time, we sho Sorry that this is not the overall time but just the duration of a single RPC. E.g. when setting the limit as 1M, for a table of 6M files and partitions, it requires 6 round-trips. Each takes 1s487ms in catalogd side. The total time in catalogd side is 1s487ms * 6 = 8s922ms. So tranlating to the overall time in catalogd side, the limit and the corresponding time: * 1M: 8s922ms * 2M: 12s105ms * 3M: 13s286ms Setting a smaller limit requires more requests and holding the table lock more times. If the table changed (by DDL/REFRESH) between them, coordinator has to resend some requests. I'll update the limit to 1M for now and tune it later. Since it's configurable, we don't need to wait for the tests too long here. http://gerrit.cloudera.org:8080/#/c/22559/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/22559/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1105 PS1, Line 1105: ids.size() - numFetchedParts); > Can this be bounded by backendCfg_.catalog_partial_fetch_max_files to not p These are partition ids. Before sending the request, coordinator don't know how many files in them. Some partitions might be empty. Some might have lots of files. Catalogd truncates the response based on the actually collected files. On the other side, these ids are sent in a numeric list, the request size is not that large comparing to the response size. -- To view, visit http://gerrit.cloudera.org:8080/22559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5 Gerrit-Change-Number: 22559 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Comment-Date: Wed, 19 Mar 2025 08:18:04 +0000 Gerrit-HasComments: Yes