This is an automated email from the ASF dual-hosted git repository.
boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 9b05a205f IMPALA-13000: Document OPTIMIZE TABLE
9b05a205f is described below
commit 9b05a205fec397fa1e19ae467b1cc406ca43d948
Author: Noemi Pap-Takacs <[email protected]>
AuthorDate: Wed Apr 17 15:04:54 2024 +0200
IMPALA-13000: Document OPTIMIZE TABLE
Document OPTIMIZE TABLE syntax and behaviour.
Testing:
- built docs locally
Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
Reviewed-on: http://gerrit.cloudera.org:8080/21320
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
Reviewed-by: Daniel Becker <[email protected]>
---
docs/topics/impala_iceberg.xml | 47 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index fce5aaa76..4cc95503a 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -546,6 +546,53 @@ UPDATE ice_t SET ice_t.k = o.k, ice_t.j = o.j, FROM ice_t,
other_table o where i
</conbody>
</concept>
+ <concept id="iceberg_optimize_table">
+ <title>Optimizing (Compacting) Iceberg tables</title>
+ <conbody>
+ <p>
+ Frequent updates and row-level modifications on Iceberg tables can
write many small
+ data files and delete files, which have to be merged-on-read.
+ This causes read performance to degrade over time.
+ The following statement can be used to compact the table and optimize
it for reading.
+ <codeblock>
+OPTIMIZE TABLE [<varname>db_name</varname>.]<varname>table_name</varname>;
+ </codeblock>
+ </p>
+
+ <p>
+ The current implementation of the <codeph>OPTIMIZE TABLE</codeph>
statement rewrites
+ the entire table, executing the following tasks:
+ <ul>
+ <li>compact small files</li>
+ <li>merge delete and update deltas</li>
+ <li>rewrite all files, converting them to the latest table
schema</li>
+ <li>rewrite all partitions according to the latest partition
spec</li>
+ </ul>
+ </p>
+
+ <p>
+ To execute table optimization:
+ <ul>
+ <li>The user needs ALL privileges on the table.</li>
+ <li>The table can conatin any file formats that Impala can read, but
<codeph>write.format.default</codeph>
+ has to be <codeph>parquet</codeph>.</li>
+ <li>The table cannot contain complex types.</li>
+ </ul>
+ </p>
+
+ <p>
+ When a table is optimized, a new snapshot is created. The old table
state is still
+ accessible by time travel to previous snapshots, because the rewritten
data and
+ delete files are not removed physically.
+ </p>
+ <p>
+ Note that the current implementation of <codeph>OPTIMIZE
TABLE</codeph> rewrites
+ the entire table, therefore this operation can take a long time to
complete
+ depending on the size of the table.
+ </p>
+ </conbody>
+ </concept>
+
<concept id="iceberg_time_travel">
<title>Time travel for Iceberg tables</title>
<conbody>