Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21959 )
Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if available ...................................................................... Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java File fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java: http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@93 PS6, Line 93: : private void loadStatsFromMetadata(StatisticsFile statsFile) { : final long > No need to return this list anymore. Done http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@127 PS6, Line 127: rack of the Ic > We don't have this parameter anymore Done http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@133 PS6, Line 133: try { > No need to check NULL-ness. Done http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@143 PS6, Line 143: : puffinReader.readAll(blobs)) { > Is it possible that the BlobMetadata in the Puffin file contains the NDV wh I don't think the spec explicitly prohibits it, but I don't think it's a use case for which we should add code complexity. http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@182 PS6, Line 182: : return res; : } : : private long getNdvFromMetadata(org.apache.iceberg.BlobMetadata blob) { : Strin > Do we need this duplicate check? Aren't these already reported during the f In the first round we only registered NDVs in 'result_' where it was available in the metadata.json. If it was missing or invalid we didn't register it. It is possible that there are two Puffin files with NDVs for the same column, but neither of them (or only one of them) has a value in the metadata.json. This situation wouldn't be discovered in the first loop. http://gerrit.cloudera.org:8080/#/c/21959/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@201 PS6, Line 201: } > Maybe we could log the invalid "ndv" string. Done -- To view, visit http://gerrit.cloudera.org:8080/21959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b Gerrit-Change-Number: 21959 Gerrit-PatchSet: 7 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 22 Nov 2024 10:37:48 +0000 Gerrit-HasComments: Yes
