Re: Iceberg disaster recovery and relative path sync-up

Jack Ye Sun, 29 Aug 2021 14:13:44 -0700

trying to catch up with the conversation here, just typing some of my
thought process:


Based on my understanding, there are in general 2 use cases:

1. multiple data copies in different physical storages, which includes:

1.1 disaster recovery: if 1 storage is completely down, all access needs to
be redirected another storage
1.2 multi-tier data access: a table is stored in multiple tiers, such as a
fast single-availability-zone (AZ) layer, a common multi-AZ layer, and a
slow but cheap archival layer
1.3 table migration: user can manually migrate a table to point to a
completely different location with completely different set of data files

2. different access points referring to the same storage, which includes:

2.1 access point: different names are pointed to the same physical storage
with different access policies.


I think we mostly reached the consensus that multiple root locations are
needed.
Now we need to ask the question of "Can a user access the storage from
catalog to at least get the table metadata and read all the different root
locations".
I think the answer is Yes for 1.2 and 1.3, but No for 1.1 and 2.1.
Therefore, we need to deal with those 2 situations separately.

For 1.1 and 2.1, the metadata cannot be accessed by the user unless an
alternative root bucket/storage endpoint is provided at catalog level. So
this information should be put as a part of the catalog API.
For 1.2 and 1.3, we need some more details about the characteristics of the
root location, specifically:
1. if the location is read only or it can also be written to. Typically we
should only allow 1 single "master" write location to avoid complicated
situations of data in different locations out of sync, unless certain
storage supports a bi-directional sync.
2. how should a reader/writer choose which root to access. To achieve that,
I think we need some root location resolver interface, with some
implementations like RegionBasedRootLocationResolver,
AccessPolicyBasedRootLocationResolver, etc. (this is just my brainstorming)

-Jack Ye






On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <flyrain...@gmail.com> wrote:

> @Ryan, how do these properties work with multiple table locations?
>
>    1.
>
>    write.metadata.path
>    2.
>
>    write.folder-storage.path
>    3.
>
>    write.object-storage.path
>
> The current logic with single table location is to honor these properties
> on top of table location. In case of multiple roots(table locations), we
> cannot mix them. For example, write.object-storage.path and table
> location should be in the same region for the DR use case.
>
>
> One of solutions to keep the similar logic is that we support all
> properties for each root, like this
>
>
> 1. Root1: (table location1, metadata path1, folder-storage path1,
> object-storage path1)
>
> 2. Root2: (table location2, metadata path2, folder-storage path2,
> object-storage path2)
>
> 3. Root3: (table location3, null, null, null)
> What do you think?
>
> @Anjali, not sure supporting federation by using multiple roots is a good
> idea. Can we create a partition with a specific location to distinguish
> data between regions?
>
> On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood
> <anorw...@netflix.com.invalid> wrote:
>
>> Hi Ryan, All,
>>
>> *"The more I think about this, the more I like the solution to add
>> multiple table roots to metadata, rather than removing table roots. Adding
>> a way to plug in a root selector makes a lot of sense to me and it ensures
>> that the metadata is complete (table location is set in metadata) and that
>> multiple locations can be used. Are there any objections or arguments
>> against doing it that way?"*
>>
>> In the context of your comment on multiple locations above, I am thinking
>> about the following scenarios:
>> 1) Disaster recovery or low-latency use case where clients connect to the
>> region geographically closest to them: In this case, multiple table roots
>> represent a copy of the table, the copies may or may not be in sync. (Most
>> likely active-active replication would be set up in this case and the
>> copies are near-identical). A root level selector works/makes sense.
>> 2) Data residency requirement: data must not leave a country/region. In
>> this case, federation of data from multiple roots constitutes the entirety
>> of the table.
>> 3) One can also imagine combinations of 1 and 2 above where some
>> locations need to be federated and some locations have data replicated from
>> other locations.
>>
>> Curious how the 2nd and 3rd scenarios would be supported with this
>> design.
>>
>> regards,
>> Anjali.
>>
>>
>>
>> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid>
>> wrote:
>>
>>>
>>> @Ryan: If I understand correctly, currently there is a possibility to
>>> change the root location of the table, and it will not change/move the old
>>> data/metadata files created before the change, only the new data/metadata
>>> files will be created in the new location.
>>>
>>> Are we planning to make sure that the tables with relative paths will
>>> always contain every data/metadata file in single root folder?
>>>
>>> Here is a few scenarios which I am thinking about:
>>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
>>> where the root location is L1, and this location is generated from the
>>> TableIdentifier (at least in creation time)
>>> 2. Data inserted to the table, so data files are created under L1, and
>>> the metadata files contain R1 relative path to the current L1.
>>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION
>>> L2)
>>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
>>> location is updated for Hive tables if the old location was the default
>>> location.
>>> 4. When we try to read this table we should read the old data/metadata
>>> files as well.
>>>
>>> So in both cases we have to move the old data/metadata files around like
>>> Hive does for the native tables, and for the tables with relative paths we
>>> do not have to change the metadata other than the root path? Will we do the
>>> same thing with other engines as well?
>>>
>>> Thanks,
>>> Peter
>>>
>>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <flyrain...@gmail.com> wrote:
>>>
>>>> Steven, here is my understanding. It depends on whether you want to
>>>> move the data. In the DR case, we do move the data, we expect data to be
>>>> identical from time to time, but not always be. In the case of S3 aliases,
>>>> different roots actually point to the same location, there is no data move,
>>>> and data is identical for sure.
>>>>
>>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <stevenz...@gmail.com> wrote:
>>>>
>>>>> For the multiple table roots, do we expect or ensure that the data are
>>>>> identical across the different roots? or this is best-effort background
>>>>> synchronization across the different roots?
>>>>>
>>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <b...@tabular.io> wrote:
>>>>>
>>>>>> Peter, I think that this feature would be useful when moving tables
>>>>>> between root locations or when you want to maintain multiple root
>>>>>> locations. Renames are orthogonal because a rename doesn't change the 
>>>>>> table
>>>>>> location. You may want to move the table after a rename, and this would
>>>>>> help in that case. But actually moving data is optional. That's why we 
>>>>>> put
>>>>>> the table location in metadata.
>>>>>>
>>>>>> Anjali, DR is a big use case, but we also talked about directing
>>>>>> accesses through other URLs, like S3 access points, table migration (like
>>>>>> the rename case), and background data migration (e.g. lifting files 
>>>>>> between
>>>>>> S3 regions). There are a few uses for it.
>>>>>>
>>>>>> The more I think about this, the more I like the solution to add
>>>>>> multiple table roots to metadata, rather than removing table roots. 
>>>>>> Adding
>>>>>> a way to plug in a root selector makes a lot of sense to me and it 
>>>>>> ensures
>>>>>> that the metadata is complete (table location is set in metadata) and 
>>>>>> that
>>>>>> multiple locations can be used. Are there any objections or arguments
>>>>>> against doing it that way?
>>>>>>
>>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> This thread is about disaster recovery and relative paths, but I
>>>>>>> wanted to ask an orthogonal but related question.
>>>>>>> Do we see disaster recovery as the only (or main) use case for
>>>>>>> multi-region?
>>>>>>> Is data residency requirement a use case for anybody? Is it possible
>>>>>>> to shard an iceberg table across regions? How is the location managed in
>>>>>>> that case?
>>>>>>>
>>>>>>> thanks,
>>>>>>> Anjali.
>>>>>>>
>>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary
>>>>>>> <pv...@cloudera.com.invalid> wrote:
>>>>>>>
>>>>>>>> Sadly, I have missed the meeting :(
>>>>>>>>
>>>>>>>> Quick question:
>>>>>>>> Was table rename / location change discussed for tables with
>>>>>>>> relative paths?
>>>>>>>>
>>>>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>>>>> metadata files, we just change the root location of the new data / 
>>>>>>>> metadata
>>>>>>>> files. If I am correct about this then we might need to handle this
>>>>>>>> differently for tables with relative paths.
>>>>>>>>
>>>>>>>> Thanks, Peter
>>>>>>>>
>>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Perfect, thank you Yufei.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Anjali
>>>>>>>>>
>>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Anjali,
>>>>>>>>>>
>>>>>>>>>> Inline...
>>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the summary Yufei.
>>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>>>>> yesterday.
>>>>>>>>>>> Is there anything in the design that would prevent multiple
>>>>>>>>>>> roots from being in different aws regions?
>>>>>>>>>>>
>>>>>>>>>> No. DR is the major use case of relative paths, if not the only
>>>>>>>>>> one. So, it will support roots in different regions.
>>>>>>>>>>
>>>>>>>>>> For disaster recovery in the case of an entire aws region down or
>>>>>>>>>>> slow, is metastore still a point of failure or can metastore be 
>>>>>>>>>>> stood up in
>>>>>>>>>>> a different region and could select a different root?
>>>>>>>>>>>
>>>>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>>>>> different region along with the table files. For example, the 
>>>>>>>>>> primary table
>>>>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>>>>> located in region B as well as its metastore. The primary table root 
>>>>>>>>>> points
>>>>>>>>>> to a path in region A, while backup table root points to a path in 
>>>>>>>>>> region B.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> regards,
>>>>>>>>>>> Anjali.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements and
>>>>>>>>>>>> the current progress of relative path approach.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>>>>> recovery.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>>>>
>>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use a 
>>>>>>>>>>>> selector
>>>>>>>>>>>> to choose different roots when we move the table from one place to 
>>>>>>>>>>>> another.
>>>>>>>>>>>> The selector reads a property to decide which root to use. The 
>>>>>>>>>>>> property
>>>>>>>>>>>> could be either from catalog or the table metadata, which is yet 
>>>>>>>>>>>> to be
>>>>>>>>>>>> decided.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is an example I’d image:
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>>>>
>>>>>>>>>>>> *Relative path use case*
>>>>>>>>>>>>
>>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us
>>>>>>>>>>>> know if there are any other use cases.
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1 doesn’t
>>>>>>>>>>>>    work, the table falls back to root2, then root3. As Russell 
>>>>>>>>>>>> mentioned, it
>>>>>>>>>>>>    is challenging to do snapshot expiration and other table 
>>>>>>>>>>>> maintenance
>>>>>>>>>>>>    actions.
>>>>>>>>>>>>
>>>>>>>>>>>> *Timeline*
>>>>>>>>>>>>
>>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec
>>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Misc*
>>>>>>>>>>>>
>>>>>>>>>>>>    1. Miao: How is the relative path compatible with the
>>>>>>>>>>>>    absolute path?
>>>>>>>>>>>>    2. How do we migrate an existing table? Build a tool for
>>>>>>>>>>>>    that.
>>>>>>>>>>>>
>>>>>>>>>>>> Please let us know if you have any ideas, questions, or
>>>>>>>>>>>> concerns.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yufei
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Tabular
>>>>>>
>>>>>

Re: Iceberg disaster recovery and relative path sync-up

Reply via email to