This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b49f45eacb04fbceb99dabbac9ddf25a35dea0a9
Author: Daniel Becker <[email protected]>
AuthorDate: Thu Nov 28 12:20:15 2024 +0100

    IMPALA-13588: Update Puffin reading doc after IMPALA-13370
    
    IMPALA-13370 added support for reading Puffin NDV stats from the
    metadata.json if the "NDV" property is available. This change updates
    the docs accordingly.
    
    Change-Id: I95f5454d736ffb3a2c043f9b490c62976ccd0c2a
    Reviewed-on: http://gerrit.cloudera.org:8080/22140
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Noemi Pap-Takacs <[email protected]>
    Reviewed-by: Peter Rozsa <[email protected]>
---
 docs/topics/impala_iceberg.xml | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 71d4d2745..2921e2b03 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -879,6 +879,12 @@ ORDER BY made_current_at;
       values in the HMS may be stale.
       </p>
       <p>
+      Some engines, e.g. Trino, also write the NDV as a property (with key 
"ndv") in the
+      "statistics" section of the metadata.json file for each blob, in 
addition to the
+      Puffin file. If such a property is present for a blob, Impala will read 
the value
+      from the metadata.json file instead of the Puffin file to reduce file 
I/O.
+      </p>
+      <p>
       Note that it is currently not possible to drop Puffin stats from Impala.
       For this reason, it is possible to disable reading Puffin stats in two 
ways:
       <ul>

Reply via email to