Hello Daniel Becker, Kurt Deschler, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/22559

to look at the new patch set (#9).

Change subject: IMPALA-11402: Add limit on files fetched by a single 
getPartialCatalogObject request
......................................................................

IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject 
request

For a table with a huge number (e.g. 6M) of files, catalogd might hit
OOM of exceeding the JVM array limit when serializing the response of
a getPartialCatalogObject request for all partitions (thus all files).

This patch adds a new flag, catalog_partial_fetch_max_files, to define
the max number of file descriptors allowed in a response of
getPartialCatalogObject. Catalogd will truncate the response in
partition level when it's too big, and only return a subset of the
requested partitions. Coordinator should send new requests to fetch the
remaining partitions.

Here are some metrics of the number of files in a single response and
the corresponding byte array size and duration of a single response:
 * 1000000: 371.71MB, 1s487ms
 * 2000000: 744.51MB, 4s035ms
 * 3000000: 1.09GB, 6s643ms
 * 4000000: 1.46GB, duration not measured due to GC pauses
 * 5000000: 1.82GB, duration not measured due to GC pauses
 * 6000000: >2GB (hit OOM)
Choose 1000000 as the default value for now. We can tune it in the
future.

Tests:
 - Added custom-cluster test
 - Ran e2e tests in local-catalog mode with
   catalog_partial_fetch_max_files=1000 so the new codes are used.

Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_local_catalog.py
7 files changed, 161 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/22559/9
--
To view, visit http://gerrit.cloudera.org:8080/22559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Gerrit-Change-Number: 22559
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>

Reply via email to