This is an automated email from the ASF dual-hosted git repository.

dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 64e43ad46 IMPALA-13410: Document reading Puffin files
64e43ad46 is described below

commit 64e43ad469c46e3b327ffd0e119f1a4613012334
Author: Daniel Becker <[email protected]>
AuthorDate: Wed Oct 2 14:23:05 2024 +0200

    IMPALA-13410: Document reading Puffin files
    
    IMPALA-13247 introduced support for reading Puffin files belonging to
    the current snapshot. This change documents it.
    
    Change-Id: Ib2975a67aadd948d9451f44a1c884349161c19d2
    Reviewed-on: http://gerrit.cloudera.org:8080/21870
    Reviewed-by: Peter Rozsa <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Csaba Ringhofer <[email protected]>
---
 docs/impala_keydefs.ditamap    |  4 ++++
 docs/topics/impala_iceberg.xml | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index f97d10a6f..fdbb5aa69 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -57,6 +57,10 @@ under the License.
     <topicmeta><linktext>the Apache Iceberg site</linktext></topicmeta>
   </keydef>
 
+  <keydef href="https://iceberg.apache.org/puffin-spec"; scope="external" 
format="html" keys="upstream_iceberg_puffin_site">
+    <topicmeta><linktext>the Apache Iceberg Puffin site</linktext></topicmeta>
+  </keydef>
+
   <keydef href="https://ozone.apache.org"; scope="external" format="html" 
keys="upstream_ozone_site">
     <topicmeta><linktext>the Apache Ozone site</linktext></topicmeta>
   </keydef>
diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 37aea6a32..71d4d2745 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -857,6 +857,45 @@ ORDER BY made_current_at;
     </conbody>
   </concept>
 
+  <concept id="iceberg_puffin_stats">
+    <title>Iceberg Puffin statistics</title>
+    <conbody>
+      <p>
+      Impala supports reading NDV (Number of Distinct Values) statistics from 
Puffin files.
+      For the Puffin specification, see <xref 
keyref="upstream_iceberg_puffin_site"/>.
+      </p>
+      <p>
+      Impala only reads Puffin stats when they are available for the current 
snapshot.
+      Puffin files or blobs that were written for other snapshots than the 
current one
+      are ignored. This behaviour is different from how Impala treats HMS 
stats, where
+      older stats can also be used - see <xref keyref="perf_stats"/> for more.
+      As this may be unintuitive for users, reading Puffin stats is disabled 
by default;
+      set the "--disable_reading_puffin_stats" startup flag to false to enable 
it.
+      </p>
+      <p>
+      When Puffin stats reading is enabled, the NDV values read from Puffin 
files take
+      precedence over NDV values stored in the HMS. This is because we only 
read Puffin
+      stats for the current snapshot, so these values are always up-to-date, 
while the
+      values in the HMS may be stale.
+      </p>
+      <p>
+      Note that it is currently not possible to drop Puffin stats from Impala.
+      For this reason, it is possible to disable reading Puffin stats in two 
ways:
+      <ul>
+        <li>Globally, with the aforementioned
+            <codeph>disable_reading_puffin_stats</codeph> startup flag - when 
it is set
+            to true, Impala will never read Puffin stats.</li>
+        <li>For specific tables, by setting the
+            <codeph>impala.iceberg_disable_reading_puffin_stats</codeph> table 
property
+            to "true".</li>
+      </ul>
+      </p>
+      <p>
+      Note that Impala does not yet support writing Puffin statistics files.
+      </p>
+    </conbody>
+  </concept>
+
   <concept id="iceberg_table_cloning">
     <title>Cloning Iceberg tables (LIKE clause)</title>
     <conbody>

Reply via email to