Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/21605 )
Change subject: IMPALA-13247: Support Reading Puffin files for the current snapshot ...................................................................... Patch Set 8: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/21605/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/21605/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@471 PS6, Line 471: IcebergColumn col = getColumnByIcebergF > I'm a little bit concern if ndv > cardinality. Some planner code make assum numRows are precise for Iceberg tables without delete files. If the table has delete files, it's not trivial to get a precise value as there can be duplicated delete records (should be rare in real life usage). http://gerrit.cloudera.org:8080/#/c/21605/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java File fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java: http://gerrit.cloudera.org:8080/#/c/21605/6/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@70 PS6, Line 70: } : return result; : } catch (Exception e) { : // We add a catch-all clause here because Puffin stats loading shouldn't be an > This CATCH should normally never be triggered, but I added it in case an un In loadStatsFromFile() we swallow some exceptions so healthy puffin files can still contribute to 'results'. So it feels a bit inconsistent and overly strict to me as here we throw away all stats, even stats coming from healthy puffin files. The Iceberg library loves unchecked exceptions, so it's possible we throw away stats on some minor error. Serious errors like OutOfMemoryError will fly through anyway. http://gerrit.cloudera.org:8080/#/c/21605/8/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java File fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java: http://gerrit.cloudera.org:8080/#/c/21605/8/fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java@75 PS8, Line 75: "Unexpected error happened while loading Puffin column statistics." Please also log the exception. -- To view, visit http://gerrit.cloudera.org:8080/21605 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50c1228988960a686d08a9b2942e01e366678866 Gerrit-Change-Number: 21605 Gerrit-PatchSet: 8 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 22 Aug 2024 16:03:00 +0000 Gerrit-HasComments: Yes
