Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21959 )

Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if 
available
......................................................................


Patch Set 1:

(5 comments)

Thanks for working on this!

http://gerrit.cloudera.org:8080/#/c/21959/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21959/1//COMMIT_MSG@21
PS1, Line 21: Puffin stats file
Please add comment about the newly added stats file. AFAICS the stats file also 
contain the correct blob metadata. Can we also have a test where the stats file 
is complete gibberish?


http://gerrit.cloudera.org:8080/#/c/21959/1/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java
File fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java:

http://gerrit.cloudera.org:8080/#/c/21959/1/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@83
PS1, Line 83:     for (StatisticsFile statsFile : statsFiles) {
            :       loader.loadStatsFromFile(statsFile);
            :     }
The logic looks good to me but I think it could be made much simpler by having 
two for-loops:

    for (StatisticsFile statsFile : statsFiles) {
      loader.loadStatsFromMetadataBlobs(statsFile);
    }
    for (StatisticsFile statsFile : statsFiles) {
      loader.loadStatsFromFile(statsFile);
    }

And in the second loop we only load the stats for field IDs that are not in 
'result_'. No need for blob partitioning, etc.


http://gerrit.cloudera.org:8080/#/c/21959/1/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@154
PS1, Line 154:   private static class BlobPartitioning {
Please add a comment about this class, also its fields.


http://gerrit.cloudera.org:8080/#/c/21959/1/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@168
PS1, Line 168:   private BlobPartitioning partitionMetadataBlobs(
Please add comment


http://gerrit.cloudera.org:8080/#/c/21959/1/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@264
PS1, Line 264:     if (blobMetadata.fields().size() != 1) return false;
Could you add a test where there more fields than one?



--
To view, visit http://gerrit.cloudera.org:8080/21959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b
Gerrit-Change-Number: 21959
Gerrit-PatchSet: 1
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 19 Nov 2024 16:45:44 +0000
Gerrit-HasComments: Yes

Reply via email to