Re: Iceberg disaster recovery and relative path sync-up

2021-09-28 Thread Anurag Mantripragada
Also, in S3’s case, my understanding is that instead of write.object-storage.path/write.data.path, users must now make sure that the location prefix must be short to get the benefits of appending a hash to the data paths. For example, a large prefix like "s3://somebucket/region/timestamp/folder

Re: Iceberg disaster recovery and relative path sync-up

2021-09-23 Thread Anurag Mantripragada
Hi Russell, I don’t have see any major issues with your approach other than that it may break some custimizability of locations. If I understand correctly, today write.object-storage.path or write.metadata.path can be outside of the table base location. With your suggestion, are we saying that

Re: Iceberg disaster recovery and relative path sync-up

2021-09-22 Thread Russell Spitzer
During a sync with Yufei and Anurag I had some thought on this proposal that I wanted to share with the wider group. As Yufei has perviously noted, I'm worried about the alternative configuration parameters like (folder-storage, object-storage). Specifically i'm thinking about the issue of movin

Re: Iceberg disaster recovery and relative path sync-up

2021-09-17 Thread Anurag Mantripragada
Hi everyone, Thanks for sharing your ideas and suggestions on this thread. I believe we have consensus on supporting multiple roots for a table and storing relative paths in metadata. We can start by adding this support in the initial phase. Yufei and I have updated the design doc[1] with the

Re: Iceberg disaster recovery and relative path sync-up

2021-09-02 Thread Ryan Blue
Yufei, answers inline: On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu wrote: > @Ryan, how do these properties work with multiple table locations? > >1. > >write.metadata.path >2. > >write.folder-storage.path >3. > >write.object-storage.path > > The current logic with single tab

Re: Iceberg disaster recovery and relative path sync-up

2021-09-02 Thread Ryan Blue
Jack, I agree with just about everything you've said. On Sun, Aug 29, 2021 at 2:13 PM Jack Ye wrote: > trying to catch up with the conversation here, just typing some of my > thought process: > > Based on my understanding, there are in general 2 use cases: > > 1. multiple data copies in differen

Re: [CWS] Re: Iceberg disaster recovery and relative path sync-up

2021-09-02 Thread Ryan Blue
Anjali, my thoughts are inline below: On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood wrote: > *"The more I think about this, the more I like the solution to add > multiple table roots to metadata, rather than removing table roots. Adding > a way to plug in a root selector makes a lot of sense to

Re: Iceberg disaster recovery and relative path sync-up

2021-09-02 Thread Ryan Blue
Are we planning to make sure that the tables with relative paths will always contain every data/metadata file in single root folder? This depends on the use case. For table mirroring, there would be a back-end service making copies of all metadata, but not for other use cases. For example, the ren

Re: Iceberg disaster recovery and relative path sync-up

2021-08-30 Thread Yufei Gu
Jack and Ryan, one valid root at a time looks straightforward and good enough for use cases like DR and certain table migration cases. Here are questions for multiple valid roots at a time, which is needed by federation use case, multiple storage tiers use case. 1. Metadata sync-up questions

Re: Iceberg disaster recovery and relative path sync-up

2021-08-29 Thread Jack Ye
trying to catch up with the conversation here, just typing some of my thought process: Based on my understanding, there are in general 2 use cases: 1. multiple data copies in different physical storages, which includes: 1.1 disaster recovery: if 1 storage is completely down, all access needs to

Re: Iceberg disaster recovery and relative path sync-up

2021-08-23 Thread Yufei Gu
@Ryan, how do these properties work with multiple table locations? 1. write.metadata.path 2. write.folder-storage.path 3. write.object-storage.path The current logic with single table location is to honor these properties on top of table location. In case of multiple roots(ta

Re: Iceberg disaster recovery and relative path sync-up

2021-08-23 Thread Anjali Norwood
Hi Ryan, All, *"The more I think about this, the more I like the solution to add multiple table roots to metadata, rather than removing table roots. Adding a way to plug in a root selector makes a lot of sense to me and it ensures that the metadata is complete (table location is set in metadata) a

Re: Iceberg disaster recovery and relative path sync-up

2021-08-22 Thread Peter Vary
@Ryan: If I understand correctly, currently there is a possibility to change the root location of the table, and it will not change/move the old data/metadata files created before the change, only the new data/metadata files will be created in the new location. Are we planning to make sure that th

Re: Iceberg disaster recovery and relative path sync-up

2021-08-22 Thread Yufei Gu
Steven, here is my understanding. It depends on whether you want to move the data. In the DR case, we do move the data, we expect data to be identical from time to time, but not always be. In the case of S3 aliases, different roots actually point to the same location, there is no data move, and dat

Re: Iceberg disaster recovery and relative path sync-up

2021-08-22 Thread Steven Wu
For the multiple table roots, do we expect or ensure that the data are identical across the different roots? or this is best-effort background synchronization across the different roots? On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue wrote: > Peter, I think that this feature would be useful when mov

Re: Iceberg disaster recovery and relative path sync-up

2021-08-22 Thread Ryan Blue
Peter, I think that this feature would be useful when moving tables between root locations or when you want to maintain multiple root locations. Renames are orthogonal because a rename doesn't change the table location. You may want to move the table after a rename, and this would help in that case

Re: Iceberg disaster recovery and relative path sync-up

2021-08-20 Thread Anjali Norwood
Hi, This thread is about disaster recovery and relative paths, but I wanted to ask an orthogonal but related question. Do we see disaster recovery as the only (or main) use case for multi-region? Is data residency requirement a use case for anybody? Is it possible to shard an iceberg table across

Re: Iceberg disaster recovery and relative path sync-up

2021-08-20 Thread Peter Vary
Sadly, I have missed the meeting :( Quick question: Was table rename / location change discussed for tables with relative paths? AFAIK when a table rename happens then we do not move old data / metadata files, we just change the root location of the new data / metadata files. If I am correct abou

Re: Iceberg disaster recovery and relative path sync-up

2021-08-13 Thread Anjali Norwood
Perfect, thank you Yufei. Regards Anjali On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu wrote: > Hi Anjali, > > Inline... > On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood > wrote: > >> Thanks for the summary Yufei. >> Sorry, if this was already discussed, I missed the meeting yesterday. >> Is there

Re: Iceberg disaster recovery and relative path sync-up

2021-08-12 Thread Yufei Gu
Hi Anjali, Inline... On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood wrote: > Thanks for the summary Yufei. > Sorry, if this was already discussed, I missed the meeting yesterday. > Is there anything in the design that would prevent multiple roots from > being in different aws regions? > No. DR i

Re: Iceberg disaster recovery and relative path sync-up

2021-08-12 Thread Anjali Norwood
Thanks for the summary Yufei. Sorry, if this was already discussed, I missed the meeting yesterday. Is there anything in the design that would prevent multiple roots from being in different aws regions? For disaster recovery in the case of an entire aws region down or slow, is metastore still a poi

Iceberg disaster recovery and relative path sync-up

2021-08-12 Thread Yufei Gu
Here is a summary of yesterday's community sync-up. Yufei gave a brief update on disaster recovery requirements and the current progress of relative path approach. Ryan: We all agreed that relative path is the way for disaster recovery. *Multiple roots for the relative path* Ryan proposed an