Hi folks, I'd like to share a recent progress of adding actions to copy tables across different places.
There is a constant need to copy tables across different places for purposes such as disaster recovery and testing. Due to the absolute file paths in Iceberg metadata, it doesn't work automatically. There are three generic solutions: 1. Rebuild the metadata: This is a proven approach widely used across various companies. 2. S3 access point: Effective when both the source and target locations are in S3, but not applicable to other storage systems. 3. Relative path: It requires changes to the table specification. We focus on the first approach in this thread. While the code has been shared 2 years ago here <https://github.com/apache/iceberg/pull/4705>, it has never been merged. We picked it up recently. Here are the active PRs related to this action. Would really appreciate any feedback and review: - PR to add CopyTable action: https://github.com/apache/iceberg/pull/10024 - PR to add CheckSnapshotIntegrity action: https://github.com/apache/iceberg/pull/10642 - PR to add RemoveExpiredFiles action: https://github.com/apache/iceberg/pull/10643 Here is a google doc with more details to clarify the goals and approach: https://docs.google.com/document/d/15oPj7ylgWQG8bhk_5aTjzHl7mlc-9f4OAH-oEpKavSc/edit?usp=sharing Yufei