I think in most cases the copy table action doesn't require a query engine
to read and generate the new metadata files. This means, that it would be
nice to provide a pure Java implementation in the core, and it could be
extended/reused by different engines, like Spark, to execute it in a
distributed manner, when distributed execution is needed.

About the copy vs. relative path debate:
- I have seen the relative path requirement coming up multiple times in the
past. Seems like a feature requested by multiple users, so I think it would
be the best to discuss it in a different thread. The Copy Table Action
might be used to move absolute path tables to relative path tables when
migration is needed.

On Mon, Jul 8, 2024, 21:52 Anurag Mantripragada
<amantriprag...@apple.com.invalid> wrote:

> Hi Yufei.
>
> Thanks for the proposal. While the actions are great, they still need to
> do a lot of work which can be reduced if we have the relative path changes.
> I still support adding these actions as moving data was out of scope for
> the relative path design and we can use these actions as helpers when the
> spec change is done.
>
> Anurag Mantripragada
>
> On Jul 8, 2024, at 10:55 AM, Pucheng Yang <pucheng.yo...@gmail.com> wrote:
>
> Thanks for picking this up, I think this is a very valuable addition.
>
> On Mon, Jul 8, 2024 at 10:48 AM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> Hi folks,
>>
>> I'd like to share a recent progress of adding actions to copy tables
>> across different places.
>>
>> There is a constant need to copy tables across different places for
>> purposes such as disaster recovery and testing. Due to the absolute file
>> paths in Iceberg metadata, it doesn't work automatically. There are three
>> generic solutions:
>> 1. Rebuild the metadata: This is a proven approach widely used across
>> various companies.
>> 2. S3 access point: Effective when both the source and target locations
>> are in S3, but not applicable to other storage systems.
>> 3. Relative path: It requires changes to the table specification.
>>
>> We focus on the first approach in this thread. While the code has been
>> shared 2 years ago here <https://github.com/apache/iceberg/pull/4705>,
>> it has never been merged. We picked it up recently. Here are the active PRs
>> related to this action. Would really appreciate any feedback and review:
>>
>>    - PR to add CopyTable action:
>>    https://github.com/apache/iceberg/pull/10024
>>    - PR to add CheckSnapshotIntegrity action:
>>    https://github.com/apache/iceberg/pull/10642
>>    - PR to add RemoveExpiredFiles action:
>>    https://github.com/apache/iceberg/pull/10643
>>
>> Here is a google doc with more details to clarify the goals and approach:
>> https://docs.google.com/document/d/15oPj7ylgWQG8bhk_5aTjzHl7mlc-9f4OAH-oEpKavSc/edit?usp=sharing
>>
>> Yufei
>>
>
>

Reply via email to