This is an automated email from the ASF dual-hosted git repository.
michaelsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 141f8b97f IMPALA-14492: Document delete orphan files for Iceberg table
141f8b97f is described below
commit 141f8b97ffa9d466df15cbfaa4706e267e27e5b4
Author: Riza Suminto <[email protected]>
AuthorDate: Mon Oct 13 12:14:42 2025 -0700
IMPALA-14492: Document delete orphan files for Iceberg table
This patch adds documentation for REMOVE_ORPHAN_FILES query added by
IMPALA-12337.
Change-Id: Ie8de6112bf9ccd879ea3e14d86e67b99e1087c0f
Reviewed-on: http://gerrit.cloudera.org:8080/23532
Reviewed-by: Michael Smith <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
---
docs/topics/impala_iceberg.xml | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml
index 7756d43d3..c9d4689e4 100644
--- a/docs/topics/impala_iceberg.xml
+++ b/docs/topics/impala_iceberg.xml
@@ -811,6 +811,37 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() -
interval 5 days);
</conbody>
</concept>
+ <concept id="iceberg_remove_orphan_files">
+ <title>Removing orphan files</title>
+ <conbody>
+ <p>
+ Failures can leave files that are not referenced by table metadata.
These are
+ called orphan files. And in some cases normal snapshot expiration may
not be able
+ to determine a file is no longer needed and delete it. Impala can
remove these
+ orphan files with
+ <codeph>ALTER TABLE ... EXECUTE remove_orphan_files(...)</codeph>
+ statement, which will remove all orphan files that has modification
time older
+ than the specified timestamp. For example:
+ <codeblock>
+-- Remove orphan files older than '2022-01-04 10:00:00'.
+ALTER TABLE ice_tbl EXECUTE remove_orphan_files('2022-01-04 10:00:00');
+
+-- Remove orphan files older than 5 days from now.
+ALTER TABLE ice_tbl EXECUTE remove_orphan_files(now() - interval 5 days);
+ </codeblock>
+ </p>
+ <p>
+ Note that this is a destructive query that will wipe out any files
within Iceberg
+ table's 'data' and 'metadata' directory that is not addressable by any
valid
+ snapshots. It is dangerous to remove orphan files with a retention
interval
+ shorter than the time expected for any write to complete because it
might corrupt
+ the table if in-progress files are considered orphaned and are
deleted. It is
+ recommended to set timestamp a day ago or older for this remove orphan
files
+ query.
+ </p>
+ </conbody>
+ </concept>
+
<concept id="iceberg_metadata_tables">
<title>Iceberg metadata tables</title>
<conbody>