Hello Daniel Becker, Kurt Deschler, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/22559

to look at the new patch set (#5).

Change subject: IMPALA-11402: Add limit on files fetched by a single 
getPartialCatalogObject request
......................................................................

IMPALA-11402: Add limit on files fetched by a single getPartialCatalogObject 
request

For a table with a huge number (e.g. 6M) of files, catalogd might hit
OOM of exceeding the JVM array limit when serializing the response of
a getPartialCatalogObject request for all partitions (thus all files).

This patch adds a new flag, catalog_partial_fetch_max_files, to define
the max number of file descriptors allowed in a response of
getPartialCatalogObject. Catalogd will truncate the response in
partition level when it's too big. In this case, some partitions will be
missing. Coordinator should send new requests to fetch the missing
partitions.

Here are some metrics of the number of files in a single response and
the corresponding byte array size and duration of a single response:
 * 1000000: 371.71MB, 1s487ms
 * 2000000: 744.51MB, 4s035ms
 * 3000000: 1.09GB, 6s643ms
 * 4000000: 1.46GB, duration not measured due to GC pauses
 * 5000000: 1.82GB, duration not measured due to GC pauses
 * 6000000: >2GB (hit OOM)
Choose 1000000 as the default value for now. We can tune it in the
future.

Tests:
 - Added custom-cluster test
 - Ran e2e tests in local-catalog mode with
   catalog_partial_fetch_max_files=1000 so the new codes are used.

Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_local_catalog.py
7 files changed, 162 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/22559/5
--
To view, visit http://gerrit.cloudera.org:8080/22559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb13fec20de5a17e7fc33613ca5cdebb9ac1a1e5
Gerrit-Change-Number: 22559
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>

Reply via email to