This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit b49f45eacb04fbceb99dabbac9ddf25a35dea0a9 Author: Daniel Becker <[email protected]> AuthorDate: Thu Nov 28 12:20:15 2024 +0100 IMPALA-13588: Update Puffin reading doc after IMPALA-13370 IMPALA-13370 added support for reading Puffin NDV stats from the metadata.json if the "NDV" property is available. This change updates the docs accordingly. Change-Id: I95f5454d736ffb3a2c043f9b490c62976ccd0c2a Reviewed-on: http://gerrit.cloudera.org:8080/22140 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Noemi Pap-Takacs <[email protected]> Reviewed-by: Peter Rozsa <[email protected]> --- docs/topics/impala_iceberg.xml | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml index 71d4d2745..2921e2b03 100644 --- a/docs/topics/impala_iceberg.xml +++ b/docs/topics/impala_iceberg.xml @@ -879,6 +879,12 @@ ORDER BY made_current_at; values in the HMS may be stale. </p> <p> + Some engines, e.g. Trino, also write the NDV as a property (with key "ndv") in the + "statistics" section of the metadata.json file for each blob, in addition to the + Puffin file. If such a property is present for a blob, Impala will read the value + from the metadata.json file instead of the Puffin file to reduce file I/O. + </p> + <p> Note that it is currently not possible to drop Puffin stats from Impala. For this reason, it is possible to disable reading Puffin stats in two ways: <ul>
