Hello Quanlong Huang, Daniel Becker, Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/23175 to look at the new patch set (#4). Change subject: IMPALA-14138: Manually disable block location loading via Hadoop config ...................................................................... IMPALA-14138: Manually disable block location loading via Hadoop config For storage systems that support block location information (HDFS, Ozone) we always retrieve it with the assumption that we can use it for scheduling, to do local reads. But it's also typical that Impala is not co-located with the storage system, not even in on-prem deployments. E.g. when Impala runs in containers, and even if they are co-located, we don't try to figure out which container runs on which machine. In such cases we should not reach out to the storage system to collect file information because it can be very expensive for large tables and we won't benefit from it at all. Since currently there is no easy way to tell if Impala is co-located with the storage system this patch adds configuration options to disable block location retrieval during table loading. It can be disabled globally via Hadoop Configuration: 'impala.preload-block-locations-for-scheduling': 'false' We can restrict it to filesystem schemes, e.g.: 'impala.preload-block-locations-for-scheduling.scheme.hdfs': 'false' When multiple storage systems are configured with the same scheme, we can still control block location loading based on authority, e.g.: 'impala.preload-block-locations-for-scheduling.authority.mycluster': 'false' The latter only disables block location loading for URIs like 'hdfs://mycluster/warehouse/tablespace/...' If block location loading is disabled by any of the switches, it cannot be re-enabled by another, i.e. the most restrictive setting prevails. E.g: disable scheme 'hdfs', enable authority 'mycluster' ==> hdfs://mycluster/ is still disabled disable globally, enable scheme 'hdfs', enable authority 'mycluster' ==> hdfs://mycluster/ is still disabled, as everything else is. Testing: * added unit tests for FileSystemUtil * added unit tests for the file metadata loaders * custom cluster tests with custom Hadoop configuration Change-Id: I1c7a6a91f657c99792db885991b7677d2c240867 --- M bin/create-test-configuration.sh M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py A testdata/workloads/functional-query/queries/QueryTest/no-block-locations.test M tests/common/custom_cluster_test_suite.py A tests/custom_cluster/test_disabled_block_locations.py 8 files changed, 294 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/23175/4 -- To view, visit http://gerrit.cloudera.org:8080/23175 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1c7a6a91f657c99792db885991b7677d2c240867 Gerrit-Change-Number: 23175 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>