Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21959 )
Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if available ...................................................................... Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/21959/4/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java File fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java: http://gerrit.cloudera.org:8080/#/c/21959/4/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@62 PS4, Line 62: public final boolean isFromMetadataJson > What is the relevance of having this field? Can we drop it? In the new patch set it's needed to determine whether we need to log a warning for duplicate stats when determining which blobs to read from a Puffin file. http://gerrit.cloudera.org:8080/#/c/21959/4/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@83 PS4, Line 83: for (StatisticsFile statsFile : statsFiles) { > Why do we maintain this map? Isn't it enough later to check that fieldId is We need a way to decide whether we need to open the Puffin file at all. For that, we want to know that it contains field ids for which there is no NDV property in the metadata. But we can go through the blobs from the metadata.json file again to create that list. Doing that in the new patch. http://gerrit.cloudera.org:8080/#/c/21959/4/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@85 PS4, Line 85: > Same here, I don't think we need to maintain this list. Done http://gerrit.cloudera.org:8080/#/c/21959/4/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@197 PS4, Line 197: > nit: indentation is off by 2 Done http://gerrit.cloudera.org:8080/#/c/21959/4/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@221 PS4, Line 221: > nit: needs +2 spaces Done http://gerrit.cloudera.org:8080/#/c/21959/4/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@268 PS4, Line 268: ie > nit: indentation is off by 2 Done http://gerrit.cloudera.org:8080/#/c/21959/4/java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java File java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java: http://gerrit.cloudera.org:8080/#/c/21959/4/java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java@438 PS4, Line 438: > nit: redundant spaces Done -- To view, visit http://gerrit.cloudera.org:8080/21959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b Gerrit-Change-Number: 21959 Gerrit-PatchSet: 6 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 21 Nov 2024 13:06:42 +0000 Gerrit-HasComments: Yes
