[
https://issues.apache.org/jira/browse/IMPALA-13370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905305#comment-17905305
]
ASF subversion and git services commented on IMPALA-13370:
----------------------------------------------------------
Commit b49f45eacb04fbceb99dabbac9ddf25a35dea0a9 in impala's branch
refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b49f45eac ]
IMPALA-13588: Update Puffin reading doc after IMPALA-13370
IMPALA-13370 added support for reading Puffin NDV stats from the
metadata.json if the "NDV" property is available. This change updates
the docs accordingly.
Change-Id: I95f5454d736ffb3a2c043f9b490c62976ccd0c2a
Reviewed-on: http://gerrit.cloudera.org:8080/22140
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Noemi Pap-Takacs <[email protected]>
Reviewed-by: Peter Rozsa <[email protected]>
> Read Puffin stats from metadata.json property if available
> ----------------------------------------------------------
>
> Key: IMPALA-13370
> URL: https://issues.apache.org/jira/browse/IMPALA-13370
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Daniel Becker
> Assignee: Daniel Becker
> Priority: Major
> Labels: impala-iceberg
>
> When Trino writes Puffin stats for a column, it includes the NDV as a
> property in the "statistics" section of the metadata.json file, in addition
> to the Theta sketch in the Puffin file. When we are only reading the stats
> and not writing/updating them, it would be enough to read this property if it
> is present.
> An example of the "statistics" section:
> {code:java}
> "statistics" : [ {
> "snapshot-id" : 1226095104912303892,
> "statistics-path" :
> "hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/metadata/20240829_112839_00004_p6sck-7f433a45-607b-4561-89a3-fc4c58ef60d9.stats",
> "file-size-in-bytes" : 306,
> "file-footer-size-in-bytes" : 257,
> "blob-metadata" : [ {
> "type" : "apache-datasketches-theta-v1",
> "snapshot-id" : 1226095104912303892,
> "sequence-number" : 4,
> "fields" : [ 1 ],
> "properties" : {
> "ndv" : "2"
> }
> } ]
> } ]{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]