Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20700 )
Change subject: IMPALA-12299: Parallelize file listings of Iceberg tables on HDFS/Ozone ...................................................................... IMPALA-12299: Parallelize file listings of Iceberg tables on HDFS/Ozone This change replaces the single-threaded file metadata listing of Iceberg datafiles with a pool-based multithreaded solution. The thread-pool size is calculated based on the filesystem's type, and it's maximized through MAX_HDFS_PARTITIONS_PARALLEL_LOAD and MAX_NON_HDFS_PARTITIONS_PARALLEL_LOAD. The parallel tasks are created from the parent directories of the datafiles, this guarantees that every datafile is listed. Manually executed benchmarks with following properties: - 280.000 partitions, 1 files each (worst case) - Thread pool size is 5 (default value for HDFS) - Used minicluster setup as a test-bench The results showed 3-4x improvement for getFileStatuses(): - Self-time of getFileStatuses: 6.599 ms vs 25.399 ms - Query time: 16.37 s vs 34.00 s Tests: exhaustive test suite ran Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d Reviewed-on: http://gerrit.cloudera.org:8080/20700 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java 2 files changed, 59 insertions(+), 19 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20700 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d Gerrit-Change-Number: 20700 Gerrit-PatchSet: 6 Gerrit-Owner: Peter Rozsa <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
