Re: Iceberg disaster recovery and relative path sync-up

Ryan Blue Thu, 02 Sep 2021 14:48:53 -0700

Jack, I agree with just about everything you've said.

On Sun, Aug 29, 2021 at 2:13 PM Jack Ye <yezhao...@gmail.com> wrote:


> trying to catch up with the conversation here, just typing some of my
> thought process:
>
> Based on my understanding, there are in general 2 use cases:
>
> 1. multiple data copies in different physical storages, which includes:
>
> 1.1 disaster recovery: if 1 storage is completely down, all access needs
> to be redirected another storage
> 1.2 multi-tier data access: a table is stored in multiple tiers, such as a
> fast single-availability-zone (AZ) layer, a common multi-AZ layer, and a
> slow but cheap archival layer
> 1.3 table migration: user can manually migrate a table to point to a
> completely different location with completely different set of data files
>
> 2. different access points referring to the same storage, which includes:
>
> 2.1 access point: different names are pointed to the same physical storage
> with different access policies.
>
>
> I think we mostly reached the consensus that multiple root locations are
> needed.
> Now we need to ask the question of "Can a user access the storage from
> catalog to at least get the table metadata and read all the different root
> locations".
> I think the answer is Yes for 1.2 and 1.3, but No for 1.1 and 2.1.
> Therefore, we need to deal with those 2 situations separately.
>
> For 1.1 and 2.1, the metadata cannot be accessed by the user unless an
> alternative root bucket/storage endpoint is provided at catalog level. So
> this information should be put as a part of the catalog API.
> For 1.2 and 1.3, we need some more details about the characteristics of
> the root location, specifically:
> 1. if the location is read only or it can also be written to. Typically we
> should only allow 1 single "master" write location to avoid complicated
> situations of data in different locations out of sync, unless certain
> storage supports a bi-directional sync.
> 2. how should a reader/writer choose which root to access. To achieve
> that, I think we need some root location resolver interface, with some
> implementations like RegionBasedRootLocationResolver,
> AccessPolicyBasedRootLocationResolver, etc. (this is just my brainstorming)
>
> -Jack Ye
>
>
>
>
>
>
> On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> @Ryan, how do these properties work with multiple table locations?
>>
>>    1.
>>
>>    write.metadata.path
>>    2.
>>
>>    write.folder-storage.path
>>    3.
>>
>>    write.object-storage.path
>>
>> The current logic with single table location is to honor these properties
>> on top of table location. In case of multiple roots(table locations), we
>> cannot mix them. For example, write.object-storage.path and table
>> location should be in the same region for the DR use case.
>>
>>
>> One of solutions to keep the similar logic is that we support all
>> properties for each root, like this
>>
>>
>> 1. Root1: (table location1, metadata path1, folder-storage path1,
>> object-storage path1)
>>
>> 2. Root2: (table location2, metadata path2, folder-storage path2,
>> object-storage path2)
>>
>> 3. Root3: (table location3, null, null, null)
>> What do you think?
>>
>> @Anjali, not sure supporting federation by using multiple roots is a good
>> idea. Can we create a partition with a specific location to distinguish
>> data between regions?
>>
>> On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood
>> <anorw...@netflix.com.invalid> wrote:
>>
>>> Hi Ryan, All,
>>>
>>> *"The more I think about this, the more I like the solution to add
>>> multiple table roots to metadata, rather than removing table roots. Adding
>>> a way to plug in a root selector makes a lot of sense to me and it ensures
>>> that the metadata is complete (table location is set in metadata) and that
>>> multiple locations can be used. Are there any objections or arguments
>>> against doing it that way?"*
>>>
>>> In the context of your comment on multiple locations above, I am
>>> thinking about the following scenarios:
>>> 1) Disaster recovery or low-latency use case where clients connect to
>>> the region geographically closest to them: In this case, multiple table
>>> roots represent a copy of the table, the copies may or may not be in sync.
>>> (Most likely active-active replication would be set up in this case and the
>>> copies are near-identical). A root level selector works/makes sense.
>>> 2) Data residency requirement: data must not leave a country/region. In
>>> this case, federation of data from multiple roots constitutes the entirety
>>> of the table.
>>> 3) One can also imagine combinations of 1 and 2 above where some
>>> locations need to be federated and some locations have data replicated from
>>> other locations.
>>>
>>> Curious how the 2nd and 3rd scenarios would be supported with this
>>> design.
>>>
>>> regards,
>>> Anjali.
>>>
>>>
>>>
>>> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid>
>>> wrote:
>>>
>>>>
>>>> @Ryan: If I understand correctly, currently there is a possibility to
>>>> change the root location of the table, and it will not change/move the old
>>>> data/metadata files created before the change, only the new data/metadata
>>>> files will be created in the new location.
>>>>
>>>> Are we planning to make sure that the tables with relative paths will
>>>> always contain every data/metadata file in single root folder?
>>>>
>>>> Here is a few scenarios which I am thinking about:
>>>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog
>>>> where the root location is L1, and this location is generated from the
>>>> TableIdentifier (at least in creation time)
>>>> 2. Data inserted to the table, so data files are created under L1, and
>>>> the metadata files contain R1 relative path to the current L1.
>>>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION
>>>> L2)
>>>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table
>>>> location is updated for Hive tables if the old location was the default
>>>> location.
>>>> 4. When we try to read this table we should read the old data/metadata
>>>> files as well.
>>>>
>>>> So in both cases we have to move the old data/metadata files around
>>>> like Hive does for the native tables, and for the tables with relative
>>>> paths we do not have to change the metadata other than the root path? Will
>>>> we do the same thing with other engines as well?
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <flyrain...@gmail.com> wrote:
>>>>
>>>>> Steven, here is my understanding. It depends on whether you want to
>>>>> move the data. In the DR case, we do move the data, we expect data to be
>>>>> identical from time to time, but not always be. In the case of S3 aliases,
>>>>> different roots actually point to the same location, there is no data 
>>>>> move,
>>>>> and data is identical for sure.
>>>>>
>>>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <stevenz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> For the multiple table roots, do we expect or ensure that the data
>>>>>> are identical across the different roots? or this is best-effort 
>>>>>> background
>>>>>> synchronization across the different roots?
>>>>>>
>>>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <b...@tabular.io> wrote:
>>>>>>
>>>>>>> Peter, I think that this feature would be useful when moving tables
>>>>>>> between root locations or when you want to maintain multiple root
>>>>>>> locations. Renames are orthogonal because a rename doesn't change the 
>>>>>>> table
>>>>>>> location. You may want to move the table after a rename, and this would
>>>>>>> help in that case. But actually moving data is optional. That's why we 
>>>>>>> put
>>>>>>> the table location in metadata.
>>>>>>>
>>>>>>> Anjali, DR is a big use case, but we also talked about directing
>>>>>>> accesses through other URLs, like S3 access points, table migration 
>>>>>>> (like
>>>>>>> the rename case), and background data migration (e.g. lifting files 
>>>>>>> between
>>>>>>> S3 regions). There are a few uses for it.
>>>>>>>
>>>>>>> The more I think about this, the more I like the solution to add
>>>>>>> multiple table roots to metadata, rather than removing table roots. 
>>>>>>> Adding
>>>>>>> a way to plug in a root selector makes a lot of sense to me and it 
>>>>>>> ensures
>>>>>>> that the metadata is complete (table location is set in metadata) and 
>>>>>>> that
>>>>>>> multiple locations can be used. Are there any objections or arguments
>>>>>>> against doing it that way?
>>>>>>>
>>>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood
>>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This thread is about disaster recovery and relative paths, but I
>>>>>>>> wanted to ask an orthogonal but related question.
>>>>>>>> Do we see disaster recovery as the only (or main) use case for
>>>>>>>> multi-region?
>>>>>>>> Is data residency requirement a use case for anybody? Is it
>>>>>>>> possible to shard an iceberg table across regions? How is the location
>>>>>>>> managed in that case?
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Anjali.
>>>>>>>>
>>>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary
>>>>>>>> <pv...@cloudera.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Sadly, I have missed the meeting :(
>>>>>>>>>
>>>>>>>>> Quick question:
>>>>>>>>> Was table rename / location change discussed for tables with
>>>>>>>>> relative paths?
>>>>>>>>>
>>>>>>>>> AFAIK when a table rename happens then we do not move old data /
>>>>>>>>> metadata files, we just change the root location of the new data / 
>>>>>>>>> metadata
>>>>>>>>> files. If I am correct about this then we might need to handle this
>>>>>>>>> differently for tables with relative paths.
>>>>>>>>>
>>>>>>>>> Thanks, Peter
>>>>>>>>>
>>>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood,
>>>>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Perfect, thank you Yufei.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Anjali
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Anjali,
>>>>>>>>>>>
>>>>>>>>>>> Inline...
>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood
>>>>>>>>>>> <anorw...@netflix.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the summary Yufei.
>>>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting
>>>>>>>>>>>> yesterday.
>>>>>>>>>>>> Is there anything in the design that would prevent multiple
>>>>>>>>>>>> roots from being in different aws regions?
>>>>>>>>>>>>
>>>>>>>>>>> No. DR is the major use case of relative paths, if not the only
>>>>>>>>>>> one. So, it will support roots in different regions.
>>>>>>>>>>>
>>>>>>>>>>> For disaster recovery in the case of an entire aws region down
>>>>>>>>>>>> or slow, is metastore still a point of failure or can metastore be 
>>>>>>>>>>>> stood up
>>>>>>>>>>>> in a different region and could select a different root?
>>>>>>>>>>>>
>>>>>>>>>>> Normally, DR also requires a backup metastore, besides the
>>>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a
>>>>>>>>>>> different region along with the table files. For example, the 
>>>>>>>>>>> primary table
>>>>>>>>>>> is located in region A as well as its metastore, the backup table is
>>>>>>>>>>> located in region B as well as its metastore. The primary table 
>>>>>>>>>>> root points
>>>>>>>>>>> to a path in region A, while backup table root points to a path in 
>>>>>>>>>>> region B.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> regards,
>>>>>>>>>>>> Anjali.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <flyrain...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Here is a summary of yesterday's community sync-up.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements
>>>>>>>>>>>>> and the current progress of relative path approach.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster
>>>>>>>>>>>>> recovery.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Multiple roots for the relative path*
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table,
>>>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use 
>>>>>>>>>>>>> a selector
>>>>>>>>>>>>> to choose different roots when we move the table from one place 
>>>>>>>>>>>>> to another.
>>>>>>>>>>>>> The selector reads a property to decide which root to use. The 
>>>>>>>>>>>>> property
>>>>>>>>>>>>> could be either from catalog or the table metadata, which is yet 
>>>>>>>>>>>>> to be
>>>>>>>>>>>>> decided.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is an example I’d image:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Root1: hdfs://nn:8020/path/to/the/table
>>>>>>>>>>>>>    2. Root2: s3://bucket1/path/to/the/table
>>>>>>>>>>>>>    3. Root3: s3://bucket2/path/to/the/table
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Relative path use case*
>>>>>>>>>>>>>
>>>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us
>>>>>>>>>>>>> know if there are any other use cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Disaster Recovery
>>>>>>>>>>>>>    2. Jack: AWS s3 bucket alias
>>>>>>>>>>>>>    3. Ryan: fall-back use case. In case that the root1
>>>>>>>>>>>>>    doesn’t work, the table falls back to root2, then root3. As 
>>>>>>>>>>>>> Russell
>>>>>>>>>>>>>    mentioned, it is challenging to do snapshot expiration and 
>>>>>>>>>>>>> other table
>>>>>>>>>>>>>    maintenance actions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Timeline*
>>>>>>>>>>>>>
>>>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec
>>>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Misc*
>>>>>>>>>>>>>
>>>>>>>>>>>>>    1. Miao: How is the relative path compatible with the
>>>>>>>>>>>>>    absolute path?
>>>>>>>>>>>>>    2. How do we migrate an existing table? Build a tool for
>>>>>>>>>>>>>    that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please let us know if you have any ideas, questions, or
>>>>>>>>>>>>> concerns.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yufei
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Tabular
>>>>>>>
>>>>>>

-- 
Ryan Blue
Tabular

Re: Iceberg disaster recovery and relative path sync-up

Reply via email to