[
https://issues.apache.org/jira/browse/IMPALA-12337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987869#comment-17987869
]
ASF subversion and git services commented on IMPALA-12337:
----------------------------------------------------------
Commit c5072807df6ad5adb48b158540259c08d1ce0424 in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c5072807d ]
IMPALA-12337: Implement delete orphan files for Iceberg table
This patch implements delete orphan files query for Iceberg table.
The following statement becomes available for Iceberg tables:
- ALTER TABLE <tbl> EXECUTE remove_orphan_files(<timestamp>)
The bulk of implementation copies Hive's implementation of
org.apache.iceberg.actions.DeleteOrphanFiles interface (HIVE-27906,
6b2e21a93ef3c1776b689a7953fc59dbf52e4be4), which this patch rename to
ImpalaIcebergDeleteOrphanFiles.java. Upon execute(),
ImpalaIcebergDeleteOrphanFiles class instance will gather all URI of
valid data files and Iceberg metadata files using Iceberg API. These
valid URIs then will be compared to recursive file listing obtained
through Hadoop FileSystem API under table's 'data' and 'metadata'
directory accordingly. Any unmatched URI from FileSystem API listing
that has modification time less than 'olderThanTimestamp' parameter will
then be removed via Iceberg FileIO API of given Iceberg table. Note that
this is a destructive query that will wipe out any files within Iceberg
table's 'data' and 'metadata' directory that is not addressable by any
valid snapshots.
The execution happens in CatalogD via
IcebergCatalogOpExecutor.alterTableExecuteRemoveOrphanFiles(). CatalogD
supplied CatalogOpExecutor.icebergExecutorService_ as executor service
to execute the Iceberg API planFiles and FileIO API for deletion.
Also fixed toSql() implementation for all ALTER TABLE EXECUTE queries.
Testing:
- Add FE and EE tests.
Change-Id: I5979cdf15048d5a2c4784918533f65f32e888de0
Reviewed-on: http://gerrit.cloudera.org:8080/23042
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
> Delete orphan files for Iceberg table
> -------------------------------------
>
> Key: IMPALA-12337
> URL: https://issues.apache.org/jira/browse/IMPALA-12337
> Project: IMPALA
> Issue Type: New Feature
> Components: Catalog, Frontend
> Reporter: Baike Xia
> Assignee: Riza Suminto
> Priority: Major
> Labels: catalog-2024, impala-iceberg
>
> Removes all files from a table’s data directory that are not linked from
> metadata files and that are older than the value of older_than parameter.
> Deleting orphan files from time to time is recommended to keep size of a
> table’s data directory under control.
> {code:java}
> ALTER TABLE test_table EXECUTE remove_orphan_files(older_than = 1431691200)
> {code}
> See the syntax for expire_snapshot:IMPALA11362
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]