trying to catch up with the conversation here, just typing some of my thought process:
Based on my understanding, there are in general 2 use cases: 1. multiple data copies in different physical storages, which includes: 1.1 disaster recovery: if 1 storage is completely down, all access needs to be redirected another storage 1.2 multi-tier data access: a table is stored in multiple tiers, such as a fast single-availability-zone (AZ) layer, a common multi-AZ layer, and a slow but cheap archival layer 1.3 table migration: user can manually migrate a table to point to a completely different location with completely different set of data files 2. different access points referring to the same storage, which includes: 2.1 access point: different names are pointed to the same physical storage with different access policies. I think we mostly reached the consensus that multiple root locations are needed. Now we need to ask the question of "Can a user access the storage from catalog to at least get the table metadata and read all the different root locations". I think the answer is Yes for 1.2 and 1.3, but No for 1.1 and 2.1. Therefore, we need to deal with those 2 situations separately. For 1.1 and 2.1, the metadata cannot be accessed by the user unless an alternative root bucket/storage endpoint is provided at catalog level. So this information should be put as a part of the catalog API. For 1.2 and 1.3, we need some more details about the characteristics of the root location, specifically: 1. if the location is read only or it can also be written to. Typically we should only allow 1 single "master" write location to avoid complicated situations of data in different locations out of sync, unless certain storage supports a bi-directional sync. 2. how should a reader/writer choose which root to access. To achieve that, I think we need some root location resolver interface, with some implementations like RegionBasedRootLocationResolver, AccessPolicyBasedRootLocationResolver, etc. (this is just my brainstorming) -Jack Ye On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <flyrain...@gmail.com> wrote: > @Ryan, how do these properties work with multiple table locations? > > 1. > > write.metadata.path > 2. > > write.folder-storage.path > 3. > > write.object-storage.path > > The current logic with single table location is to honor these properties > on top of table location. In case of multiple roots(table locations), we > cannot mix them. For example, write.object-storage.path and table > location should be in the same region for the DR use case. > > > One of solutions to keep the similar logic is that we support all > properties for each root, like this > > > 1. Root1: (table location1, metadata path1, folder-storage path1, > object-storage path1) > > 2. Root2: (table location2, metadata path2, folder-storage path2, > object-storage path2) > > 3. Root3: (table location3, null, null, null) > What do you think? > > @Anjali, not sure supporting federation by using multiple roots is a good > idea. Can we create a partition with a specific location to distinguish > data between regions? > > On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood > <anorw...@netflix.com.invalid> wrote: > >> Hi Ryan, All, >> >> *"The more I think about this, the more I like the solution to add >> multiple table roots to metadata, rather than removing table roots. Adding >> a way to plug in a root selector makes a lot of sense to me and it ensures >> that the metadata is complete (table location is set in metadata) and that >> multiple locations can be used. Are there any objections or arguments >> against doing it that way?"* >> >> In the context of your comment on multiple locations above, I am thinking >> about the following scenarios: >> 1) Disaster recovery or low-latency use case where clients connect to the >> region geographically closest to them: In this case, multiple table roots >> represent a copy of the table, the copies may or may not be in sync. (Most >> likely active-active replication would be set up in this case and the >> copies are near-identical). A root level selector works/makes sense. >> 2) Data residency requirement: data must not leave a country/region. In >> this case, federation of data from multiple roots constitutes the entirety >> of the table. >> 3) One can also imagine combinations of 1 and 2 above where some >> locations need to be federated and some locations have data replicated from >> other locations. >> >> Curious how the 2nd and 3rd scenarios would be supported with this >> design. >> >> regards, >> Anjali. >> >> >> >> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid> >> wrote: >> >>> >>> @Ryan: If I understand correctly, currently there is a possibility to >>> change the root location of the table, and it will not change/move the old >>> data/metadata files created before the change, only the new data/metadata >>> files will be created in the new location. >>> >>> Are we planning to make sure that the tables with relative paths will >>> always contain every data/metadata file in single root folder? >>> >>> Here is a few scenarios which I am thinking about: >>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog >>> where the root location is L1, and this location is generated from the >>> TableIdentifier (at least in creation time) >>> 2. Data inserted to the table, so data files are created under L1, and >>> the metadata files contain R1 relative path to the current L1. >>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION >>> L2) >>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table >>> location is updated for Hive tables if the old location was the default >>> location. >>> 4. When we try to read this table we should read the old data/metadata >>> files as well. >>> >>> So in both cases we have to move the old data/metadata files around like >>> Hive does for the native tables, and for the tables with relative paths we >>> do not have to change the metadata other than the root path? Will we do the >>> same thing with other engines as well? >>> >>> Thanks, >>> Peter >>> >>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <flyrain...@gmail.com> wrote: >>> >>>> Steven, here is my understanding. It depends on whether you want to >>>> move the data. In the DR case, we do move the data, we expect data to be >>>> identical from time to time, but not always be. In the case of S3 aliases, >>>> different roots actually point to the same location, there is no data move, >>>> and data is identical for sure. >>>> >>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <stevenz...@gmail.com> wrote: >>>> >>>>> For the multiple table roots, do we expect or ensure that the data are >>>>> identical across the different roots? or this is best-effort background >>>>> synchronization across the different roots? >>>>> >>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <b...@tabular.io> wrote: >>>>> >>>>>> Peter, I think that this feature would be useful when moving tables >>>>>> between root locations or when you want to maintain multiple root >>>>>> locations. Renames are orthogonal because a rename doesn't change the >>>>>> table >>>>>> location. You may want to move the table after a rename, and this would >>>>>> help in that case. But actually moving data is optional. That's why we >>>>>> put >>>>>> the table location in metadata. >>>>>> >>>>>> Anjali, DR is a big use case, but we also talked about directing >>>>>> accesses through other URLs, like S3 access points, table migration (like >>>>>> the rename case), and background data migration (e.g. lifting files >>>>>> between >>>>>> S3 regions). There are a few uses for it. >>>>>> >>>>>> The more I think about this, the more I like the solution to add >>>>>> multiple table roots to metadata, rather than removing table roots. >>>>>> Adding >>>>>> a way to plug in a root selector makes a lot of sense to me and it >>>>>> ensures >>>>>> that the metadata is complete (table location is set in metadata) and >>>>>> that >>>>>> multiple locations can be used. Are there any objections or arguments >>>>>> against doing it that way? >>>>>> >>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood >>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> This thread is about disaster recovery and relative paths, but I >>>>>>> wanted to ask an orthogonal but related question. >>>>>>> Do we see disaster recovery as the only (or main) use case for >>>>>>> multi-region? >>>>>>> Is data residency requirement a use case for anybody? Is it possible >>>>>>> to shard an iceberg table across regions? How is the location managed in >>>>>>> that case? >>>>>>> >>>>>>> thanks, >>>>>>> Anjali. >>>>>>> >>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary >>>>>>> <pv...@cloudera.com.invalid> wrote: >>>>>>> >>>>>>>> Sadly, I have missed the meeting :( >>>>>>>> >>>>>>>> Quick question: >>>>>>>> Was table rename / location change discussed for tables with >>>>>>>> relative paths? >>>>>>>> >>>>>>>> AFAIK when a table rename happens then we do not move old data / >>>>>>>> metadata files, we just change the root location of the new data / >>>>>>>> metadata >>>>>>>> files. If I am correct about this then we might need to handle this >>>>>>>> differently for tables with relative paths. >>>>>>>> >>>>>>>> Thanks, Peter >>>>>>>> >>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, >>>>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>>>> >>>>>>>>> Perfect, thank you Yufei. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Anjali >>>>>>>>> >>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Anjali, >>>>>>>>>> >>>>>>>>>> Inline... >>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood >>>>>>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks for the summary Yufei. >>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting >>>>>>>>>>> yesterday. >>>>>>>>>>> Is there anything in the design that would prevent multiple >>>>>>>>>>> roots from being in different aws regions? >>>>>>>>>>> >>>>>>>>>> No. DR is the major use case of relative paths, if not the only >>>>>>>>>> one. So, it will support roots in different regions. >>>>>>>>>> >>>>>>>>>> For disaster recovery in the case of an entire aws region down or >>>>>>>>>>> slow, is metastore still a point of failure or can metastore be >>>>>>>>>>> stood up in >>>>>>>>>>> a different region and could select a different root? >>>>>>>>>>> >>>>>>>>>> Normally, DR also requires a backup metastore, besides the >>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a >>>>>>>>>> different region along with the table files. For example, the >>>>>>>>>> primary table >>>>>>>>>> is located in region A as well as its metastore, the backup table is >>>>>>>>>> located in region B as well as its metastore. The primary table root >>>>>>>>>> points >>>>>>>>>> to a path in region A, while backup table root points to a path in >>>>>>>>>> region B. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> regards, >>>>>>>>>>> Anjali. >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Here is a summary of yesterday's community sync-up. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements and >>>>>>>>>>>> the current progress of relative path approach. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster >>>>>>>>>>>> recovery. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Multiple roots for the relative path* >>>>>>>>>>>> >>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table, >>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use a >>>>>>>>>>>> selector >>>>>>>>>>>> to choose different roots when we move the table from one place to >>>>>>>>>>>> another. >>>>>>>>>>>> The selector reads a property to decide which root to use. The >>>>>>>>>>>> property >>>>>>>>>>>> could be either from catalog or the table metadata, which is yet >>>>>>>>>>>> to be >>>>>>>>>>>> decided. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Here is an example I’d image: >>>>>>>>>>>> >>>>>>>>>>>> 1. Root1: hdfs://nn:8020/path/to/the/table >>>>>>>>>>>> 2. Root2: s3://bucket1/path/to/the/table >>>>>>>>>>>> 3. Root3: s3://bucket2/path/to/the/table >>>>>>>>>>>> >>>>>>>>>>>> *Relative path use case* >>>>>>>>>>>> >>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us >>>>>>>>>>>> know if there are any other use cases. >>>>>>>>>>>> >>>>>>>>>>>> 1. Disaster Recovery >>>>>>>>>>>> 2. Jack: AWS s3 bucket alias >>>>>>>>>>>> 3. Ryan: fall-back use case. In case that the root1 doesn’t >>>>>>>>>>>> work, the table falls back to root2, then root3. As Russell >>>>>>>>>>>> mentioned, it >>>>>>>>>>>> is challenging to do snapshot expiration and other table >>>>>>>>>>>> maintenance >>>>>>>>>>>> actions. >>>>>>>>>>>> >>>>>>>>>>>> *Timeline* >>>>>>>>>>>> >>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec >>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *Misc* >>>>>>>>>>>> >>>>>>>>>>>> 1. Miao: How is the relative path compatible with the >>>>>>>>>>>> absolute path? >>>>>>>>>>>> 2. How do we migrate an existing table? Build a tool for >>>>>>>>>>>> that. >>>>>>>>>>>> >>>>>>>>>>>> Please let us know if you have any ideas, questions, or >>>>>>>>>>>> concerns. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Yufei >>>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>>