(impala) branch master updated: IMPALA-13000: Document OPTIMIZE TABLE

boroknagyz Mon, 22 Apr 2024 03:48:49 -0700

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git



The following commit(s) were added to refs/heads/master by this push:
     new 9b05a205f IMPALA-13000: Document OPTIMIZE TABLE
9b05a205f is described below

commit 9b05a205fec397fa1e19ae467b1cc406ca43d948
Author: Noemi Pap-Takacs <[email protected]>
AuthorDate: Wed Apr 17 15:04:54 2024 +0200

    IMPALA-13000: Document OPTIMIZE TABLE
    
    Document OPTIMIZE TABLE syntax and behaviour.
    
    Testing:
     - built docs locally
    
    Change-Id: I851669686ed4da610dcac97c9b88ff23b0a4a647
    Reviewed-on: http://gerrit.cloudera.org:8080/21320
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Zoltan Borok-Nagy <[email protected]>
    Reviewed-by: Daniel Becker <[email protected]>
---
 docs/topics/impala_iceberg.xml | 47 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index fce5aaa76..4cc95503a 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -546,6 +546,53 @@ UPDATE ice_t SET ice_t.k = o.k, ice_t.j = o.j, FROM ice_t, 
other_table o where i
     </conbody>
   </concept>
 
+  <concept id="iceberg_optimize_table">
+    <title>Optimizing (Compacting) Iceberg tables</title>
+    <conbody>
+      <p>
+        Frequent updates and row-level modifications on Iceberg tables can 
write many small
+        data files and delete files, which have to be merged-on-read.
+        This causes read performance to degrade over time.
+        The following statement can be used to compact the table and optimize 
it for reading.
+        <codeblock>
+OPTIMIZE TABLE [<varname>db_name</varname>.]<varname>table_name</varname>;
+        </codeblock>
+      </p>
+
+      <p>
+        The current implementation of the <codeph>OPTIMIZE TABLE</codeph> 
statement rewrites
+        the entire table, executing the following tasks:
+        <ul>
+          <li>compact small files</li>
+          <li>merge delete and update deltas</li>
+          <li>rewrite all files, converting them to the latest table 
schema</li>
+          <li>rewrite all partitions according to the latest partition 
spec</li>
+        </ul>
+      </p>
+
+      <p>
+        To execute table optimization:
+        <ul>
+          <li>The user needs ALL privileges on the table.</li>
+          <li>The table can conatin any file formats that Impala can read, but 
<codeph>write.format.default</codeph>
+          has to be <codeph>parquet</codeph>.</li>
+          <li>The table cannot contain complex types.</li>
+        </ul>
+      </p>
+
+      <p>
+        When a table is optimized, a new snapshot is created. The old table 
state is still
+        accessible by time travel to previous snapshots, because the rewritten 
data and
+        delete files are not removed physically.
+      </p>
+      <p>
+        Note that the current implementation of <codeph>OPTIMIZE 
TABLE</codeph> rewrites
+        the entire table, therefore this operation can take a long time to 
complete
+        depending on the size of the table.
+      </p>
+    </conbody>
+  </concept>
+
   <concept id="iceberg_time_travel">
     <title>Time travel for Iceberg tables</title>
     <conbody>

(impala) branch master updated: IMPALA-13000: Document OPTIMIZE TABLE

Reply via email to