Re: Spark: Copy Table Action

2025-02-20 Thread Szehon Ho
conf. For example, if the file path is, >>> hdfs:///data/openhouse/db/tb_uuid, what is stored in Iceberg >>> metadata is /data/openhouse/db/tb_uuid, and hdfs:// comes from >>> Hadoop conf. >>> >>> Has the community considered an approach where the sche

Re: Spark: Copy Table Action

2025-02-20 Thread Pucheng Yang
ch where the scheme and cluster is >> minted by the catalog, to be used in the respective FileIO implementation >> for the blob stores. For example, if we had a bucket foo on us-east, and >> bucket bar on us-west, the catalog running on us-east would mint s3://foo, >> and

Re: Spark: Copy Table Action

2024-08-15 Thread Yufei Gu
running on us-east would mint s3://foo, > and the catalog running on us-west would mint s3://bar, and the S3FileIO > would join that with rest of the relative path to the object. This would > allow us to capture the absolute path relative to s3:// in the > Iceberg metadata? > >

Re: Spark: Copy Table Action

2024-07-12 Thread Sumedh Sakdeo
: Spark: Copy Table Action Hi Yufei, I was wondering if we also want to support the use case of moving tables in this proposal? For example, users might have various reasons to change the table location, however, there is no good way to move original data files to the new location unless we are

Re: Spark: Copy Table Action

2024-07-11 Thread Pucheng Yang
Hi Yufei, I was wondering if we also want to support the use case of moving tables in this proposal? For example, users might have various reasons to change the table location, however, there is no good way to move original data files to the new location unless we are doing data files rewrite, but

Re: Spark: Copy Table Action

2024-07-10 Thread Ajantha Bhat
> > For RemoveExpiredFiles, I'm admittedly a bit skeptical if it's required > since orphan file removal should be able to cleanup the files in the > copied table. Are we able to elaborate why there's a concern with removing > snapshots on the copied table and subsequently relying on orphan file > r

Re: Spark: Copy Table Action

2024-07-09 Thread Amogh Jahagirdar
Thanks Yufei! +1 on having a copy table action, I think that's pretty valuable. I have some ideas on interfaces based on previous work I've done for region/multi-cloud replication of Iceberg tables. The absolute vs relative path discussion is interesting, I have some questions on how relative path

Re: Spark: Copy Table Action

2024-07-09 Thread Anurag Mantripragada
Agreed with Peter. I will bring relative paths changes up in the next community sync. I will help drive this. ~ Anurag Mantripragada > On Jul 8, 2024, at 10:50 PM, Péter Váry wrote: > > I think in most cases the copy table action doesn't require a query engine to > read and generate the

Re: Spark: Copy Table Action

2024-07-08 Thread Péter Váry
I think in most cases the copy table action doesn't require a query engine to read and generate the new metadata files. This means, that it would be nice to provide a pure Java implementation in the core, and it could be extended/reused by different engines, like Spark, to execute it in a distribut

Re: Spark: Copy Table Action

2024-07-08 Thread Anurag Mantripragada
Hi Yufei. Thanks for the proposal. While the actions are great, they still need to do a lot of work which can be reduced if we have the relative path changes. I still support adding these actions as moving data was out of scope for the relative path design and we can use these actions as helpe

Re: Spark: Copy Table Action

2024-07-08 Thread Pucheng Yang
Thanks for picking this up, I think this is a very valuable addition. On Mon, Jul 8, 2024 at 10:48 AM Yufei Gu wrote: > Hi folks, > > I'd like to share a recent progress of adding actions to copy tables > across different places. > > There is a constant need to copy tables across different place