Peter Rozsa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20700
Change subject: IMPALA-12299: Parallelize file listings of Iceberg tables on HDFS/Ozone ...................................................................... IMPALA-12299: Parallelize file listings of Iceberg tables on HDFS/Ozone This change replaces the single-threaded file metadata listing of Iceberg datafiles with a pool-based multithreaded solution. The thread-pool size is calculated based on the filesystem's type, and it's maximized through MAX_HDFS_PARTITIONS_PARALLEL_LOAD and MAX_NON_HDFS_PARTITIONS_PARALLEL_LOAD. The parallel tasks are created from the parent directories of the datafiles, this guarantees that every datafile is listed. Manually executed benchmarks with table with 280.000 files shows 3-4x speedup. Tests: exhaustive test suite ran Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d --- M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java 2 files changed, 60 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/20700/2 -- To view, visit http://gerrit.cloudera.org:8080/20700 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d Gerrit-Change-Number: 20700 Gerrit-PatchSet: 2 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
