For the multiple table roots, do we expect or ensure that the data are
identical across the different roots? or this is best-effort background
synchronization across the different roots?

On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <b...@tabular.io> wrote:

> Peter, I think that this feature would be useful when moving tables
> between root locations or when you want to maintain multiple root
> locations. Renames are orthogonal because a rename doesn't change the table
> location. You may want to move the table after a rename, and this would
> help in that case. But actually moving data is optional. That's why we put
> the table location in metadata.
>
> Anjali, DR is a big use case, but we also talked about directing accesses
> through other URLs, like S3 access points, table migration (like the rename
> case), and background data migration (e.g. lifting files between S3
> regions). There are a few uses for it.
>
> The more I think about this, the more I like the solution to add multiple
> table roots to metadata, rather than removing table roots. Adding a way to
> plug in a root selector makes a lot of sense to me and it ensures that the
> metadata is complete (table location is set in metadata) and that multiple
> locations can be used. Are there any objections or arguments against doing
> it that way?
>
> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
> <anorw...@netflix.com.invalid> wrote:
>
>> Hi,
>>
>> This thread is about disaster recovery and relative paths, but I wanted
>> to ask an orthogonal but related question.
>> Do we see disaster recovery as the only (or main) use case for
>> multi-region?
>> Is data residency requirement a use case for anybody? Is it possible to
>> shard an iceberg table across regions? How is the location managed in that
>> case?
>>
>> thanks,
>> Anjali.
>>
>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
>> wrote:
>>
>>> Sadly, I have missed the meeting :(
>>>
>>> Quick question:
>>> Was table rename / location change discussed for tables with relative
>>> paths?
>>>
>>> AFAIK when a table rename happens then we do not move old data /
>>> metadata files, we just change the root location of the new data / metadata
>>> files. If I am correct about this then we might need to handle this
>>> differently for tables with relative paths.
>>>
>>> Thanks, Peter
>>>
>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <anorw...@netflix.com.invalid>
>>> wrote:
>>>
>>>> Perfect, thank you Yufei.
>>>>
>>>> Regards
>>>> Anjali
>>>>
>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>>>
>>>>> Hi Anjali,
>>>>>
>>>>> Inline...
>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>
>>>>>> Thanks for the summary Yufei.
>>>>>> Sorry, if this was already discussed, I missed the meeting yesterday.
>>>>>> Is there anything in the design that would prevent multiple roots
>>>>>> from being in different aws regions?
>>>>>>
>>>>> No. DR is the major use case of relative paths, if not the only one.
>>>>> So, it will support roots in different regions.
>>>>>
>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>> slow, is metastore still a point of failure or can metastore be stood up 
>>>>>> in
>>>>>> a different region and could select a different root?
>>>>>>
>>>>> Normally, DR also requires a backup metastore, besides the storage(s3
>>>>> bucket). In that case, the backup metastore will be in a different region
>>>>> along with the table files. For example, the primary table is located in
>>>>> region A as well as its metastore, the backup table is located in region B
>>>>> as well as its metastore. The primary table root points to a path in 
>>>>> region
>>>>> A, while backup table root points to a path in region B.
>>>>>
>>>>>
>>>>>> regards,
>>>>>> Anjali.
>>>>>>
>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <flyrain...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>
>>>>>>>
>>>>>>> Yufei gave a brief update on disaster recovery requirements and the
>>>>>>> current progress of relative path approach.
>>>>>>>
>>>>>>>
>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>> recovery.
>>>>>>>
>>>>>>>
>>>>>>> *Multiple roots for the relative path*
>>>>>>>
>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>> basically, we can add a list of roots in table metadata, and use a 
>>>>>>> selector
>>>>>>> to choose different roots when we move the table from one place to 
>>>>>>> another.
>>>>>>> The selector reads a property to decide which root to use. The property
>>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>>> decided.
>>>>>>>
>>>>>>>
>>>>>>> Here is an example I’d image:
>>>>>>>
>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>
>>>>>>> *Relative path use case*
>>>>>>>
>>>>>>> We brainstormed use cases for relative paths. Please let us know if
>>>>>>> there are any other use cases.
>>>>>>>
>>>>>>>    1. Disaster Recovery
>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>    work, the table falls back to root2, then root3. As Russell 
>>>>>>> mentioned, it
>>>>>>>    is challenging to do snapshot expiration and other table maintenance
>>>>>>>    actions.
>>>>>>>
>>>>>>> *Timeline*
>>>>>>>
>>>>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>
>>>>>>>
>>>>>>> *Misc*
>>>>>>>
>>>>>>>    1. Miao: How is the relative path compatible with the absolute
>>>>>>>    path?
>>>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>>>
>>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yufei
>>>>>>>
>>>>>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to