Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20700 )

Change subject: IMPALA-12299: Parallelize file listings of Iceberg tables on 
HDFS/Ozone
......................................................................


Patch Set 2:

(5 comments)

Left some minor comments, otherwise the change looks great!

http://gerrit.cloudera.org:8080/#/c/20700/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20700/2//COMMIT_MSG@17
PS2, Line 17: 3-4x
            : speedup.
Could you please elaborate on this a little more? I think it's worth to 
highlight that this was in your minicluster environment (if I assume 
correctly). What was the partition:file ration? Did you use multi-level 
partitioning? Also, what was the size of the thread pool?


http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@206
PS2, Line 206: {
nit: for single-line if statements we omit the braces.


http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@239
PS2, Line 239: partitionPaths = new HashSet<>(pool.submit(this 
::icebergFilesByPartition).get());
Can we just invoke icebergFilesByPartition() in the current thread? Submitting 
it to the pool and getting it immediately doesn't seem to have any added value.


http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/IcebergFileMetadataLoader.java@247
PS2, Line 247: pool.shutdown();
nit: for readability, it worth to write it in its own line.


http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/20700/2/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java@30
PS2, Line 30: import java.util.concurrent.ForkJoinPool;
The new imports seem to be unused.



--
To view, visit http://gerrit.cloudera.org:8080/20700
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5ca7e873f4ad0cc8dab6a77b62e05d965b4a76d
Gerrit-Change-Number: 20700
Gerrit-PatchSet: 2
Gerrit-Owner: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 21 Nov 2023 13:32:13 +0000
Gerrit-HasComments: Yes

Reply via email to