This is an automated email from the ASF dual-hosted git repository.
dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 64e43ad46 IMPALA-13410: Document reading Puffin files
64e43ad46 is described below
commit 64e43ad469c46e3b327ffd0e119f1a4613012334
Author: Daniel Becker <[email protected]>
AuthorDate: Wed Oct 2 14:23:05 2024 +0200
IMPALA-13410: Document reading Puffin files
IMPALA-13247 introduced support for reading Puffin files belonging to
the current snapshot. This change documents it.
Change-Id: Ib2975a67aadd948d9451f44a1c884349161c19d2
Reviewed-on: http://gerrit.cloudera.org:8080/21870
Reviewed-by: Peter Rozsa <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>
---
docs/impala_keydefs.ditamap | 4 ++++
docs/topics/impala_iceberg.xml | 39 +++++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index f97d10a6f..fdbb5aa69 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -57,6 +57,10 @@ under the License.
<topicmeta><linktext>the Apache Iceberg site</linktext></topicmeta>
</keydef>
+ <keydef href="https://iceberg.apache.org/puffin-spec" scope="external"
format="html" keys="upstream_iceberg_puffin_site">
+ <topicmeta><linktext>the Apache Iceberg Puffin site</linktext></topicmeta>
+ </keydef>
+
<keydef href="https://ozone.apache.org" scope="external" format="html"
keys="upstream_ozone_site">
<topicmeta><linktext>the Apache Ozone site</linktext></topicmeta>
</keydef>
diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 37aea6a32..71d4d2745 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -857,6 +857,45 @@ ORDER BY made_current_at;
</conbody>
</concept>
+ <concept id="iceberg_puffin_stats">
+ <title>Iceberg Puffin statistics</title>
+ <conbody>
+ <p>
+ Impala supports reading NDV (Number of Distinct Values) statistics from
Puffin files.
+ For the Puffin specification, see <xref
keyref="upstream_iceberg_puffin_site"/>.
+ </p>
+ <p>
+ Impala only reads Puffin stats when they are available for the current
snapshot.
+ Puffin files or blobs that were written for other snapshots than the
current one
+ are ignored. This behaviour is different from how Impala treats HMS
stats, where
+ older stats can also be used - see <xref keyref="perf_stats"/> for more.
+ As this may be unintuitive for users, reading Puffin stats is disabled
by default;
+ set the "--disable_reading_puffin_stats" startup flag to false to enable
it.
+ </p>
+ <p>
+ When Puffin stats reading is enabled, the NDV values read from Puffin
files take
+ precedence over NDV values stored in the HMS. This is because we only
read Puffin
+ stats for the current snapshot, so these values are always up-to-date,
while the
+ values in the HMS may be stale.
+ </p>
+ <p>
+ Note that it is currently not possible to drop Puffin stats from Impala.
+ For this reason, it is possible to disable reading Puffin stats in two
ways:
+ <ul>
+ <li>Globally, with the aforementioned
+ <codeph>disable_reading_puffin_stats</codeph> startup flag - when
it is set
+ to true, Impala will never read Puffin stats.</li>
+ <li>For specific tables, by setting the
+ <codeph>impala.iceberg_disable_reading_puffin_stats</codeph> table
property
+ to "true".</li>
+ </ul>
+ </p>
+ <p>
+ Note that Impala does not yet support writing Puffin statistics files.
+ </p>
+ </conbody>
+ </concept>
+
<concept id="iceberg_table_cloning">
<title>Cloning Iceberg tables (LIKE clause)</title>
<conbody>