Daniel Becker has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/21959 )

Change subject: IMPALA-13370: Read Puffin stats from metadata.json property if 
available
......................................................................

IMPALA-13370: Read Puffin stats from metadata.json property if available

When Trino writes Puffin stats for a column, it includes the NDV as a
property (with key "ndv") in the "statistics" section of the
metadata.json file, in addition to the Theta sketch in the Puffin file.
When we are only reading the stats and not writing/updating them, it is
enough to read this property if it is present.

After this change, Impala only opens and reads a Puffin stats file if it
contains stats for at least one column for which the "ndv" property is
not set in the metadata.json file.

Testing:
 - added a test in test_iceberg_with_puffin.py that verifies that the
   Puffin stats file is not read if the the metadata.json file contains
   the NDV property. It uses the newly added stats file with corrupt
   datasketches: 'metadata_ndv_ok_sketches_corrupt.stats'.

Change-Id: I5e92056ce97c4849742db6309562af3b575f647b
---
M fe/src/main/java/org/apache/impala/catalog/PuffinStatsLoader.java
M 
java/puffin-data-generator/src/main/java/org/apache/impala/puffindatagenerator/PuffinDataGenerator.java
A testdata/ice_puffin/generated/metadata_ndv_ok_sketches_corrupt.stats
A testdata/ice_puffin/generated/metadata_ndv_ok_stats_file_corrupt.metadata.json
A testdata/ice_puffin/generated/multiple_field_ids.metadata.json
A testdata/ice_puffin/generated/multiple_field_ids.stats
M tests/custom_cluster/test_iceberg_with_puffin.py
7 files changed, 673 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/21959/8
--
To view, visit http://gerrit.cloudera.org:8080/21959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5e92056ce97c4849742db6309562af3b575f647b
Gerrit-Change-Number: 21959
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to