Peter, I think that this feature would be useful when moving tables between
root locations or when you want to maintain multiple root locations.
Renames are orthogonal because a rename doesn't change the table location.
You may want to move the table after a rename, and this would help in that
case. But actually moving data is optional. That's why we put the table
location in metadata.

Anjali, DR is a big use case, but we also talked about directing accesses
through other URLs, like S3 access points, table migration (like the rename
case), and background data migration (e.g. lifting files between S3
regions). There are a few uses for it.

The more I think about this, the more I like the solution to add multiple
table roots to metadata, rather than removing table roots. Adding a way to
plug in a root selector makes a lot of sense to me and it ensures that the
metadata is complete (table location is set in metadata) and that multiple
locations can be used. Are there any objections or arguments against doing
it that way?

On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood <anorw...@netflix.com.invalid>
wrote:

> Hi,
>
> This thread is about disaster recovery and relative paths, but I wanted to
> ask an orthogonal but related question.
> Do we see disaster recovery as the only (or main) use case for
> multi-region?
> Is data residency requirement a use case for anybody? Is it possible to
> shard an iceberg table across regions? How is the location managed in that
> case?
>
> thanks,
> Anjali.
>
> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary <pv...@cloudera.com.invalid>
> wrote:
>
>> Sadly, I have missed the meeting :(
>>
>> Quick question:
>> Was table rename / location change discussed for tables with relative
>> paths?
>>
>> AFAIK when a table rename happens then we do not move old data / metadata
>> files, we just change the root location of the new data / metadata files.
>> If I am correct about this then we might need to handle this differently
>> for tables with relative paths.
>>
>> Thanks, Peter
>>
>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <anorw...@netflix.com.invalid>
>> wrote:
>>
>>> Perfect, thank you Yufei.
>>>
>>> Regards
>>> Anjali
>>>
>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>>
>>>> Hi Anjali,
>>>>
>>>> Inline...
>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>> <anorw...@netflix.com.invalid> wrote:
>>>>
>>>>> Thanks for the summary Yufei.
>>>>> Sorry, if this was already discussed, I missed the meeting yesterday.
>>>>> Is there anything in the design that would prevent multiple roots from
>>>>> being in different aws regions?
>>>>>
>>>> No. DR is the major use case of relative paths, if not the only one.
>>>> So, it will support roots in different regions.
>>>>
>>>> For disaster recovery in the case of an entire aws region down or slow,
>>>>> is metastore still a point of failure or can metastore be stood up in a
>>>>> different region and could select a different root?
>>>>>
>>>> Normally, DR also requires a backup metastore, besides the storage(s3
>>>> bucket). In that case, the backup metastore will be in a different region
>>>> along with the table files. For example, the primary table is located in
>>>> region A as well as its metastore, the backup table is located in region B
>>>> as well as its metastore. The primary table root points to a path in region
>>>> A, while backup table root points to a path in region B.
>>>>
>>>>
>>>>> regards,
>>>>> Anjali.
>>>>>
>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <flyrain...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>
>>>>>>
>>>>>> Yufei gave a brief update on disaster recovery requirements and the
>>>>>> current progress of relative path approach.
>>>>>>
>>>>>>
>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>> recovery.
>>>>>>
>>>>>>
>>>>>> *Multiple roots for the relative path*
>>>>>>
>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>> basically, we can add a list of roots in table metadata, and use a 
>>>>>> selector
>>>>>> to choose different roots when we move the table from one place to 
>>>>>> another.
>>>>>> The selector reads a property to decide which root to use. The property
>>>>>> could be either from catalog or the table metadata, which is yet to be
>>>>>> decided.
>>>>>>
>>>>>>
>>>>>> Here is an example I’d image:
>>>>>>
>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>
>>>>>> *Relative path use case*
>>>>>>
>>>>>> We brainstormed use cases for relative paths. Please let us know if
>>>>>> there are any other use cases.
>>>>>>
>>>>>>    1. Disaster Recovery
>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t work,
>>>>>>    the table falls back to root2, then root3. As Russell mentioned, it is
>>>>>>    challenging to do snapshot expiration and other table maintenance 
>>>>>> actions.
>>>>>>
>>>>>>
>>>>>> *Timeline*
>>>>>>
>>>>>> In terms of timeline, relative path could be a feature in Spec V3,
>>>>>> since Spec V1 and V2 assume absolute path in metadata.
>>>>>>
>>>>>>
>>>>>> *Misc*
>>>>>>
>>>>>>    1. Miao: How is the relative path compatible with the absolute
>>>>>>    path?
>>>>>>    2. How do we migrate an existing table? Build a tool for that.
>>>>>>
>>>>>> Please let us know if you have any ideas, questions, or concerns.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yufei
>>>>>>
>>>>>

-- 
Ryan Blue
Tabular

Reply via email to