Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/21605 )
Change subject: IMPALA-13247: Support Reading Puffin files for the current snapshot ...................................................................... Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/21605/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/21605/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@471 PS6, Line 471: IcebergColumn col = getColumnByIcebergF > Do we cap the NDV stats to the number of rows in the case of HMS statistics There is a way to keep the testing simple while capping NDV to numRows: in the test, we can set the 'numRows' table property explicitly to a large value. However, I tried setting this table property, then setting NDV manually (as an HMS stat), and that is not capped. In this case I think it would be strange to cap the Puffin NDV value while not doing the same with the HMS NDV value. Probably we should leave it as it is in this patch and maybe open a Jira to cap the values in both cases. What do you think? http://gerrit.cloudera.org:8080/#/c/21605/9/java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java File java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java: http://gerrit.cloudera.org:8080/#/c/21605/9/java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java@371 PS9, Line 371: > There are som other parameters of the PuffinWriter that we don't have cover I added "created-by" for all files. Added blob compression for some of the generated files. I also tried footer compression but that doesn't work yet, because the spec only allows LZ4 for footer compression but the Iceberg lib does not yet support LZ4. -- To view, visit http://gerrit.cloudera.org:8080/21605 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50c1228988960a686d08a9b2942e01e366678866 Gerrit-Change-Number: 21605 Gerrit-PatchSet: 11 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 03 Sep 2024 15:14:55 +0000 Gerrit-HasComments: Yes
