Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/21959 )
Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if available ...................................................................... IMPALA-13370: Read Puffin stats from metadata.json property if available When Trino writes Puffin stats for a column, it includes the NDV as a property (with key "ndv") in the "statistics" section of the metadata.json file, in addition to the Theta sketch in the Puffin file. When we are only reading the stats and not writing/updating them, it is enough to read this property if it is present. After this change, Impala only opens and reads a Puffin stats file if it contains stats for at least one column for which the "ndv" property is not set in the metadata.json file. Testing: - added a test in test_iceberg_with_puffin.py that verifies that the Puffin stats file is not read if the the metadata.json file contains the NDV property. It uses the newly added stats file with corrupt datasketches: 'metadata_ndv_ok_sketches_corrupt.stats'. Change-Id: I5e92056ce97c4849742db6309562af3b575f647b --- M fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java M java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java A testdata/ice_puffin/generated/metadata_ndv_ok_sketches_corrupt.stats A testdata/ice_puffin/generated/metadata_ndv_ok_stats_file_corrupt.metadata.json A testdata/ice_puffin/generated/multiple_field_ids.metadata.json A testdata/ice_puffin/generated/multiple_field_ids.stats M tests/custom_cluster/test_iceberg_with_puffin.py 7 files changed, 645 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/21959/4 -- To view, visit http://gerrit.cloudera.org:8080/21959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b Gerrit-Change-Number: 21959 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
