This is an automated email from the ASF dual-hosted git repository. joemcdonnell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit 12276c79f9975dc63322138ea56290434a49221d Author: Zoltan Borok-Nagy <[email protected]> AuthorDate: Tue Aug 15 14:47:03 2023 +0200 IMPALA-12335: [DOCS] Add documentation about the DELETE statement IMPALA-11877 added support for the DELETE statement for Iceberg tables. This patch documents this feature. Change-Id: If111a7ecd20bda2d4928332ef2ccd905814cb203 Reviewed-on: http://gerrit.cloudera.org:8080/20361 Reviewed-by: Zoltan Borok-Nagy <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- docs/impala_keydefs.ditamap | 1 + docs/topics/impala_delete.xml | 35 +++++++++++++---------- docs/topics/impala_iceberg.xml | 63 ++++++++++++++++++++++++++++++++++++++---- 3 files changed, 80 insertions(+), 19 deletions(-) diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap index 737da1089..601d8208a 100644 --- a/docs/impala_keydefs.ditamap +++ b/docs/impala_keydefs.ditamap @@ -10529,6 +10529,7 @@ under the License. <keydef href="https://issues.apache.org/jira/browse/IMPALA-9999" scope="external" format="html" keys="IMPALA-9999"/> <!-- Short form of mapping from Impala release to vendor-specific releases, for use in headings. --> + <keydef keys="impala43"><topicmeta><keywords><keyword>Impala 4.3</keyword></keywords></topicmeta></keydef> <keydef keys="impala42"><topicmeta><keywords><keyword>Impala 4.2</keyword></keywords></topicmeta></keydef> <keydef keys="impala41"><topicmeta><keywords><keyword>Impala 4.1</keyword></keywords></topicmeta></keydef> <keydef keys="impala40"><topicmeta><keywords><keyword>Impala 4.0</keyword></keywords></topicmeta></keydef> diff --git a/docs/topics/impala_delete.xml b/docs/topics/impala_delete.xml index c8591b6bb..5611e9d42 100644 --- a/docs/topics/impala_delete.xml +++ b/docs/topics/impala_delete.xml @@ -27,6 +27,7 @@ under the License. <data name="Category" value="Impala"/> <data name="Category" value="SQL"/> <data name="Category" value="Kudu"/> + <data name="Category" value="Iceberg"/> <data name="Category" value="ETL"/> <data name="Category" value="Ingest"/> <data name="Category" value="DML"/> @@ -39,8 +40,8 @@ under the License. <p> <indexterm audience="hidden">DELETE statement</indexterm> - Deletes an arbitrary number of rows from a Kudu table. - This statement only works for Impala tables that use the Kudu storage engine. + Deletes an arbitrary number of rows from a table. + This statement only works for Kudu (from <keyword keyref="impala28"/>) and Iceberg tables (from <keyword keyref="impala43"/>). </p> <p conref="../shared/impala_common.xml#common/syntax_blurb"/> @@ -61,8 +62,8 @@ DELETE <varname>table_ref</varname> FROM [<varname>joined_table_refs</varname>] <p> The second form evaluates one or more join clauses, and deletes all matching rows from one of the tables. The join clauses can - include non-Kudu tables, but the table from which the rows - are deleted must be a Kudu table. The <codeph>FROM</codeph> + include tables of any kind, but the table from which the rows + are deleted must be a Kudu or Iceberg table. The <codeph>FROM</codeph> keyword is required in this case, to separate the name of the table whose rows are being deleted from the table names of the join clauses. @@ -80,7 +81,7 @@ DELETE <varname>table_ref</varname> FROM [<varname>joined_table_refs</varname>] The conditions in the <codeph>WHERE</codeph> clause can refer to any combination of primary key columns or other columns. Referring to primary key columns in the <codeph>WHERE</codeph> clause is more efficient - than referring to non-primary key columns. + than referring to non-primary key columns (in case of Kudu tables). </p> <p> @@ -113,6 +114,11 @@ DELETE <varname>table_ref</varname> FROM [<varname>joined_table_refs</varname>] </li> </ul> + <p> + Iceberg also allows concurrent DELETE operations, in which case the concurrent DELETEs might + remove the same rows. This won't corrupt the table as Iceberg allows such behavior. + </p> + <p> The number of affected rows is reported in an <cmdname>impala-shell</cmdname> message and in the query profile. @@ -132,16 +138,16 @@ DELETE <varname>table_ref</varname> FROM [<varname>joined_table_refs</varname>] <codeblock> -- Deletes all rows. The FROM keyword is optional. -DELETE FROM kudu_table; -DELETE kudu_table; +DELETE FROM table; +DELETE table; -- Deletes 0, 1, or more rows. -- (If c1 is a single-column primary key, the statement could only -- delete 0 or 1 rows.) -DELETE FROM kudu_table WHERE c1 = 100; +DELETE FROM table WHERE c1 = 100; -- Deletes all rows that match all the WHERE conditions. -DELETE FROM kudu_table WHERE +DELETE FROM table WHERE (c1 > c2 OR c3 IN ('hello','world')) AND c4 IS NOT NULL; DELETE FROM t1 WHERE (c1 IN (1,2,3) AND c2 > c3) OR c4 IS NOT NULL; @@ -153,7 +159,7 @@ DELETE FROM t1 WHERE c5 IN (SELECT DISTINCT other_col FROM other_table); -- Does not delete any rows, because the WHERE condition is always false. -DELETE FROM kudu_table WHERE 1 = 0; +DELETE FROM table WHERE 1 = 0; </codeblock> <p> @@ -169,17 +175,18 @@ DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x; DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x WHERE t1.y = FALSE and t2.z > 100; --- Delete from a Kudu table based on a join with a non-Kudu table. -DELETE t1 FROM kudu_table t1 JOIN non_kudu_table t2 ON t1.x = t2.x; +-- Delete from a table based on a join with another table. +DELETE t1 FROM table t1 JOIN other_table t2 ON t1.x = t2.x; --- The tables can be joined in any order as long as the Kudu table +-- The tables can be joined in any order as long as the Kudu or Iceberg table -- is specified as the deletion target. -DELETE t2 FROM non_kudu_table t1 JOIN kudu_table t2 ON t1.x = t2.x; +DELETE t2 FROM non_kudu_non_ice_table t1 JOIN kudu_or_ice_table t2 ON t1.x = t2.x; </codeblock> <p conref="../shared/impala_common.xml#common/related_info"/> <p> + <xref href="impala_iceberg.xml#impala_iceberg"/>, <xref href="impala_kudu.xml#impala_kudu"/>, <xref href="impala_insert.xml#insert"/>, <xref href="impala_update.xml#update"/>, <xref href="impala_upsert.xml#upsert"/> </p> diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml index 1504efcdb..66dd7fd8c 100644 --- a/docs/topics/impala_iceberg.xml +++ b/docs/topics/impala_iceberg.xml @@ -43,10 +43,6 @@ under the License. Iceberg Catalogs (e.g. HiveCatalog, HadoopCatalog). It also supports location-based tables (HadoopTables). </p> - <p> - Currently only Iceberg V1 DML operations are allowed, i.e. INSERT INTO /INSERT OVERWRITE. - Iceberg V2 operations like row-level modifications (UPDATE, DELETE) are not supported yet. - </p> <p> For more information on Iceberg, see <xref keyref="upstream_iceberg_site"/>. @@ -214,6 +210,37 @@ CREATE TABLE ice_ctas_part_spec PARTITIONED BY SPEC (truncate(3, s)) STORED AS I </conbody> </concept> + <concept id="iceberg_v2"> + <title>Iceberg V2 tables</title> + <conbody> + <p> + Iceberg V2 tables support row-level modifications (DELETE, UPDATE) via "merge-on-read", which means instead + of rewriting existing data files, separate so-called delete files are being written that store information + about the deleted records. There are two kinds of delete files in Iceberg: + <ul> + <li>position deletes</li> + <li>equality deletes</li> + </ul> + Impala only supports position delete files. These files contain the file path and file position of the deleted + rows. + </p> + <p> + One can create Iceberg V2 tables via the <codeph>CREATE TABLE</codeph> statement, they just need to specify + the 'format-version' table property: + <codeblock> +CREATE TABLE ice_v2 (i int) STORED BY ICEBERG TBLPROPERTIES('format-version'='2'); + </codeblock> + </p> + <p> + It is also possible to upgrade existing Iceberg V1 tables to Iceberg V2 tables. One can use the following + <codeph>ALTER TABLE</codeph> statement to do so: + <codeblock> +ALTER TABLE ice_v1_to_v2 SET TBLPROPERTIES('format-version'='2'); + </codeblock> + </p> + </conbody> + </concept> + <concept id="iceberg_drop"> <title>Dropping Iceberg tables</title> <conbody> @@ -418,7 +445,7 @@ CREATE TABLE ice_p (i INT, b INT) PARTITIONED BY (p1 INT, p2 STRING) STORED AS I </concept> <concept id="iceberg_inserts"> - <title>Writing Iceberg tables</title> + <title>Inserting data into Iceberg tables</title> <conbody> <p> Impala is also able to insert new data to Iceberg tables. Currently the <codeph>INSERT INTO</codeph> @@ -447,6 +474,32 @@ INSERT INTO ice_p VALUES (1, 2); </conbody> </concept> + <concept id="iceberg_delete"> + <title>Delete data from Iceberg tables</title> + <conbody> + <p> + Since <keyword keyref="impala43"/> Impala is able to run <codeph>DELETE</codeph> statements against + Iceberg V2 tables. E.g.: + <codeblock> +DELETE FROM ice_t where i = 3; + </codeblock> + </p> + <p> + More information about the <codeph>DELETE</codeph> statement can be found at <xref href="impala_delete.xml#delete"/>. + </p> + </conbody> + </concept> + + <concept id="iceberg_update"> + <title>Updating data int Iceberg tables</title> + <conbody> + <p> + Impala does not yet support the UPDATE statement. One can mimic its behaviour by calling + DELETE + INSERT statements one after the other, but please note that this creates two transactions. + </p> + </conbody> + </concept> + <concept id="iceberg_load"> <title>Loading data into Iceberg tables</title> <conbody>
