Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20700 )

Change subject: IMPALA-12299: Parallelize file listings of Iceberg tables on 
HDFS/Ozone
......................................................................

IMPALA-12299: Parallelize file listings of Iceberg tables on HDFS/Ozone

This change replaces the single-threaded file metadata listing of
Iceberg datafiles with a pool-based multithreaded solution. The
thread-pool size is calculated based on the filesystem's type, and it's
maximized through MAX_HDFS_PARTITIONS_PARALLEL_LOAD and
MAX_NON_HDFS_PARTITIONS_PARALLEL_LOAD. The parallel tasks are created
from the parent directories of the datafiles, this guarantees that every
datafile is listed.

Manually executed benchmarks with following properties:
 - 280.000 partitions, 1 files each (worst case)
 - Thread pool size is 5 (default value for HDFS)
 - Used minicluster setup as a test-bench

The results showed 3-4x improvement for getFileStatuses():
 - Self-time of getFileStatuses: 6.599 ms vs 25.399 ms
 - Query time: 16.37 s vs 34.00 s

Tests: exhaustive test suite ran

Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d
Reviewed-on: http://gerrit.cloudera.org:8080/20700
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
2 files changed, 59 insertions(+), 19 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/20700
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d
Gerrit-Change-Number: 20700
Gerrit-PatchSet: 6
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to