Hi Yufei. 

Thanks for the proposal. While the actions are great, they still need to do a 
lot of work which can be reduced if we have the relative path changes. I still 
support adding these actions as moving data was out of scope for the relative 
path design and we can use these actions as helpers when the spec change is 
done. 

Anurag Mantripragada

> On Jul 8, 2024, at 10:55 AM, Pucheng Yang <pucheng.yo...@gmail.com> wrote:
> 
> Thanks for picking this up, I think this is a very valuable addition.
> 
> On Mon, Jul 8, 2024 at 10:48 AM Yufei Gu <flyrain...@gmail.com 
> <mailto:flyrain...@gmail.com>> wrote:
>> Hi folks,
>> 
>> I'd like to share a recent progress of adding actions to copy tables across 
>> different places.
>> 
>> There is a constant need to copy tables across different places for purposes 
>> such as disaster recovery and testing. Due to the absolute file paths in 
>> Iceberg metadata, it doesn't work automatically. There are three generic 
>> solutions:
>> 1. Rebuild the metadata: This is a proven approach widely used across 
>> various companies.
>> 2. S3 access point: Effective when both the source and target locations are 
>> in S3, but not applicable to other storage systems.
>> 3. Relative path: It requires changes to the table specification.
>> 
>> We focus on the first approach in this thread. While the code has been 
>> shared 2 years ago here <https://github.com/apache/iceberg/pull/4705>, it 
>> has never been merged. We picked it up recently. Here are the active PRs 
>> related to this action. Would really appreciate any feedback and review:
>> PR to add CopyTable action: https://github.com/apache/iceberg/pull/10024
>> PR to add CheckSnapshotIntegrity action: 
>> https://github.com/apache/iceberg/pull/10642
>> PR to add RemoveExpiredFiles 
>> action:https://github.com/apache/iceberg/pull/10643
>> Here is a google doc with more details to clarify the goals and approach: 
>> https://docs.google.com/document/d/15oPj7ylgWQG8bhk_5aTjzHl7mlc-9f4OAH-oEpKavSc/edit?usp=sharing
>> 
>> Yufei

Reply via email to