Hello Daniel Becker, Kurt Deschler, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22559 to look at the new patch set (#9). Change subject: IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject request ...................................................................... IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject request For a table with a huge number (e.g. 6M) of files, catalogd might hit OOM of exceeding the JVM array limit when serializing the response of a getPartialCatalogObject request for all partitions (thus all files). This patch adds a new flag, catalog_partial_fetch_max_files, to define the max number of file descriptors allowed in a response of getPartialCatalogObject. Catalogd will truncate the response in partition level when it's too big, and only return a subset of the requested partitions. Coordinator should send new requests to fetch the remaining partitions. Here are some metrics of the number of files in a single response and the corresponding byte array size and duration of a single response: * 1000000: 371.71MB, 1s487ms * 2000000: 744.51MB, 4s035ms * 3000000: 1.09GB, 6s643ms * 4000000: 1.46GB, duration not measured due to GC pauses * 5000000: 1.82GB, duration not measured due to GC pauses * 6000000: >2GB (hit OOM) Choose 1000000 as the default value for now. We can tune it in the future. Tests: - Added custom-cluster test - Ran e2e tests in local-catalog mode with catalog_partial_fetch_max_files=1000 so the new codes are used. Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5 --- M be/src/catalog/catalog-server.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M tests/custom_cluster/test_local_catalog.py 7 files changed, 161 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/22559/9 -- To view, visit http://gerrit.cloudera.org:8080/22559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5 Gerrit-Change-Number: 22559 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>