Jack, I agree with just about everything you've said. On Sun, Aug 29, 2021 at 2:13 PM Jack Ye <yezhao...@gmail.com> wrote:
> trying to catch up with the conversation here, just typing some of my > thought process: > > Based on my understanding, there are in general 2 use cases: > > 1. multiple data copies in different physical storages, which includes: > > 1.1 disaster recovery: if 1 storage is completely down, all access needs > to be redirected another storage > 1.2 multi-tier data access: a table is stored in multiple tiers, such as a > fast single-availability-zone (AZ) layer, a common multi-AZ layer, and a > slow but cheap archival layer > 1.3 table migration: user can manually migrate a table to point to a > completely different location with completely different set of data files > > 2. different access points referring to the same storage, which includes: > > 2.1 access point: different names are pointed to the same physical storage > with different access policies. > > > I think we mostly reached the consensus that multiple root locations are > needed. > Now we need to ask the question of "Can a user access the storage from > catalog to at least get the table metadata and read all the different root > locations". > I think the answer is Yes for 1.2 and 1.3, but No for 1.1 and 2.1. > Therefore, we need to deal with those 2 situations separately. > > For 1.1 and 2.1, the metadata cannot be accessed by the user unless an > alternative root bucket/storage endpoint is provided at catalog level. So > this information should be put as a part of the catalog API. > For 1.2 and 1.3, we need some more details about the characteristics of > the root location, specifically: > 1. if the location is read only or it can also be written to. Typically we > should only allow 1 single "master" write location to avoid complicated > situations of data in different locations out of sync, unless certain > storage supports a bi-directional sync. > 2. how should a reader/writer choose which root to access. To achieve > that, I think we need some root location resolver interface, with some > implementations like RegionBasedRootLocationResolver, > AccessPolicyBasedRootLocationResolver, etc. (this is just my brainstorming) > > -Jack Ye > > > > > > > On Mon, Aug 23, 2021 at 4:06 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> @Ryan, how do these properties work with multiple table locations? >> >> 1. >> >> write.metadata.path >> 2. >> >> write.folder-storage.path >> 3. >> >> write.object-storage.path >> >> The current logic with single table location is to honor these properties >> on top of table location. In case of multiple roots(table locations), we >> cannot mix them. For example, write.object-storage.path and table >> location should be in the same region for the DR use case. >> >> >> One of solutions to keep the similar logic is that we support all >> properties for each root, like this >> >> >> 1. Root1: (table location1, metadata path1, folder-storage path1, >> object-storage path1) >> >> 2. Root2: (table location2, metadata path2, folder-storage path2, >> object-storage path2) >> >> 3. Root3: (table location3, null, null, null) >> What do you think? >> >> @Anjali, not sure supporting federation by using multiple roots is a good >> idea. Can we create a partition with a specific location to distinguish >> data between regions? >> >> On Mon, Aug 23, 2021 at 1:14 PM Anjali Norwood >> <anorw...@netflix.com.invalid> wrote: >> >>> Hi Ryan, All, >>> >>> *"The more I think about this, the more I like the solution to add >>> multiple table roots to metadata, rather than removing table roots. Adding >>> a way to plug in a root selector makes a lot of sense to me and it ensures >>> that the metadata is complete (table location is set in metadata) and that >>> multiple locations can be used. Are there any objections or arguments >>> against doing it that way?"* >>> >>> In the context of your comment on multiple locations above, I am >>> thinking about the following scenarios: >>> 1) Disaster recovery or low-latency use case where clients connect to >>> the region geographically closest to them: In this case, multiple table >>> roots represent a copy of the table, the copies may or may not be in sync. >>> (Most likely active-active replication would be set up in this case and the >>> copies are near-identical). A root level selector works/makes sense. >>> 2) Data residency requirement: data must not leave a country/region. In >>> this case, federation of data from multiple roots constitutes the entirety >>> of the table. >>> 3) One can also imagine combinations of 1 and 2 above where some >>> locations need to be federated and some locations have data replicated from >>> other locations. >>> >>> Curious how the 2nd and 3rd scenarios would be supported with this >>> design. >>> >>> regards, >>> Anjali. >>> >>> >>> >>> On Sun, Aug 22, 2021 at 11:22 PM Peter Vary <pv...@cloudera.com.invalid> >>> wrote: >>> >>>> >>>> @Ryan: If I understand correctly, currently there is a possibility to >>>> change the root location of the table, and it will not change/move the old >>>> data/metadata files created before the change, only the new data/metadata >>>> files will be created in the new location. >>>> >>>> Are we planning to make sure that the tables with relative paths will >>>> always contain every data/metadata file in single root folder? >>>> >>>> Here is a few scenarios which I am thinking about: >>>> 1. Table T1 is created with relative path in HadoopCatalog/HiveCatalog >>>> where the root location is L1, and this location is generated from the >>>> TableIdentifier (at least in creation time) >>>> 2. Data inserted to the table, so data files are created under L1, and >>>> the metadata files contain R1 relative path to the current L1. >>>> 3a. Table location is updated to L2 (Hive: ALTER TABLE T1 SET LOCATION >>>> L2) >>>> 3b. Table renamed to T2 (Hive: ALTER TABLE T1 RENAME TO T2) - Table >>>> location is updated for Hive tables if the old location was the default >>>> location. >>>> 4. When we try to read this table we should read the old data/metadata >>>> files as well. >>>> >>>> So in both cases we have to move the old data/metadata files around >>>> like Hive does for the native tables, and for the tables with relative >>>> paths we do not have to change the metadata other than the root path? Will >>>> we do the same thing with other engines as well? >>>> >>>> Thanks, >>>> Peter >>>> >>>> On Mon, 23 Aug 2021, 06:38 Yufei Gu, <flyrain...@gmail.com> wrote: >>>> >>>>> Steven, here is my understanding. It depends on whether you want to >>>>> move the data. In the DR case, we do move the data, we expect data to be >>>>> identical from time to time, but not always be. In the case of S3 aliases, >>>>> different roots actually point to the same location, there is no data >>>>> move, >>>>> and data is identical for sure. >>>>> >>>>> On Sun, Aug 22, 2021 at 8:19 PM Steven Wu <stevenz...@gmail.com> >>>>> wrote: >>>>> >>>>>> For the multiple table roots, do we expect or ensure that the data >>>>>> are identical across the different roots? or this is best-effort >>>>>> background >>>>>> synchronization across the different roots? >>>>>> >>>>>> On Sun, Aug 22, 2021 at 11:53 AM Ryan Blue <b...@tabular.io> wrote: >>>>>> >>>>>>> Peter, I think that this feature would be useful when moving tables >>>>>>> between root locations or when you want to maintain multiple root >>>>>>> locations. Renames are orthogonal because a rename doesn't change the >>>>>>> table >>>>>>> location. You may want to move the table after a rename, and this would >>>>>>> help in that case. But actually moving data is optional. That's why we >>>>>>> put >>>>>>> the table location in metadata. >>>>>>> >>>>>>> Anjali, DR is a big use case, but we also talked about directing >>>>>>> accesses through other URLs, like S3 access points, table migration >>>>>>> (like >>>>>>> the rename case), and background data migration (e.g. lifting files >>>>>>> between >>>>>>> S3 regions). There are a few uses for it. >>>>>>> >>>>>>> The more I think about this, the more I like the solution to add >>>>>>> multiple table roots to metadata, rather than removing table roots. >>>>>>> Adding >>>>>>> a way to plug in a root selector makes a lot of sense to me and it >>>>>>> ensures >>>>>>> that the metadata is complete (table location is set in metadata) and >>>>>>> that >>>>>>> multiple locations can be used. Are there any objections or arguments >>>>>>> against doing it that way? >>>>>>> >>>>>>> On Fri, Aug 20, 2021 at 9:00 AM Anjali Norwood >>>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This thread is about disaster recovery and relative paths, but I >>>>>>>> wanted to ask an orthogonal but related question. >>>>>>>> Do we see disaster recovery as the only (or main) use case for >>>>>>>> multi-region? >>>>>>>> Is data residency requirement a use case for anybody? Is it >>>>>>>> possible to shard an iceberg table across regions? How is the location >>>>>>>> managed in that case? >>>>>>>> >>>>>>>> thanks, >>>>>>>> Anjali. >>>>>>>> >>>>>>>> On Fri, Aug 20, 2021 at 12:20 AM Peter Vary >>>>>>>> <pv...@cloudera.com.invalid> wrote: >>>>>>>> >>>>>>>>> Sadly, I have missed the meeting :( >>>>>>>>> >>>>>>>>> Quick question: >>>>>>>>> Was table rename / location change discussed for tables with >>>>>>>>> relative paths? >>>>>>>>> >>>>>>>>> AFAIK when a table rename happens then we do not move old data / >>>>>>>>> metadata files, we just change the root location of the new data / >>>>>>>>> metadata >>>>>>>>> files. If I am correct about this then we might need to handle this >>>>>>>>> differently for tables with relative paths. >>>>>>>>> >>>>>>>>> Thanks, Peter >>>>>>>>> >>>>>>>>> On Fri, 13 Aug 2021, 15:12 Anjali Norwood, >>>>>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> Perfect, thank you Yufei. >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Anjali >>>>>>>>>> >>>>>>>>>> On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Anjali, >>>>>>>>>>> >>>>>>>>>>> Inline... >>>>>>>>>>> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood >>>>>>>>>>> <anorw...@netflix.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks for the summary Yufei. >>>>>>>>>>>> Sorry, if this was already discussed, I missed the meeting >>>>>>>>>>>> yesterday. >>>>>>>>>>>> Is there anything in the design that would prevent multiple >>>>>>>>>>>> roots from being in different aws regions? >>>>>>>>>>>> >>>>>>>>>>> No. DR is the major use case of relative paths, if not the only >>>>>>>>>>> one. So, it will support roots in different regions. >>>>>>>>>>> >>>>>>>>>>> For disaster recovery in the case of an entire aws region down >>>>>>>>>>>> or slow, is metastore still a point of failure or can metastore be >>>>>>>>>>>> stood up >>>>>>>>>>>> in a different region and could select a different root? >>>>>>>>>>>> >>>>>>>>>>> Normally, DR also requires a backup metastore, besides the >>>>>>>>>>> storage(s3 bucket). In that case, the backup metastore will be in a >>>>>>>>>>> different region along with the table files. For example, the >>>>>>>>>>> primary table >>>>>>>>>>> is located in region A as well as its metastore, the backup table is >>>>>>>>>>> located in region B as well as its metastore. The primary table >>>>>>>>>>> root points >>>>>>>>>>> to a path in region A, while backup table root points to a path in >>>>>>>>>>> region B. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> regards, >>>>>>>>>>>> Anjali. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <flyrain...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Here is a summary of yesterday's community sync-up. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yufei gave a brief update on disaster recovery requirements >>>>>>>>>>>>> and the current progress of relative path approach. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Ryan: We all agreed that relative path is the way for disaster >>>>>>>>>>>>> recovery. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *Multiple roots for the relative path* >>>>>>>>>>>>> >>>>>>>>>>>>> Ryan proposed an idea to enable multiple roots for a table, >>>>>>>>>>>>> basically, we can add a list of roots in table metadata, and use >>>>>>>>>>>>> a selector >>>>>>>>>>>>> to choose different roots when we move the table from one place >>>>>>>>>>>>> to another. >>>>>>>>>>>>> The selector reads a property to decide which root to use. The >>>>>>>>>>>>> property >>>>>>>>>>>>> could be either from catalog or the table metadata, which is yet >>>>>>>>>>>>> to be >>>>>>>>>>>>> decided. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Here is an example I’d image: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Root1: hdfs://nn:8020/path/to/the/table >>>>>>>>>>>>> 2. Root2: s3://bucket1/path/to/the/table >>>>>>>>>>>>> 3. Root3: s3://bucket2/path/to/the/table >>>>>>>>>>>>> >>>>>>>>>>>>> *Relative path use case* >>>>>>>>>>>>> >>>>>>>>>>>>> We brainstormed use cases for relative paths. Please let us >>>>>>>>>>>>> know if there are any other use cases. >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Disaster Recovery >>>>>>>>>>>>> 2. Jack: AWS s3 bucket alias >>>>>>>>>>>>> 3. Ryan: fall-back use case. In case that the root1 >>>>>>>>>>>>> doesn’t work, the table falls back to root2, then root3. As >>>>>>>>>>>>> Russell >>>>>>>>>>>>> mentioned, it is challenging to do snapshot expiration and >>>>>>>>>>>>> other table >>>>>>>>>>>>> maintenance actions. >>>>>>>>>>>>> >>>>>>>>>>>>> *Timeline* >>>>>>>>>>>>> >>>>>>>>>>>>> In terms of timeline, relative path could be a feature in Spec >>>>>>>>>>>>> V3, since Spec V1 and V2 assume absolute path in metadata. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> *Misc* >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Miao: How is the relative path compatible with the >>>>>>>>>>>>> absolute path? >>>>>>>>>>>>> 2. How do we migrate an existing table? Build a tool for >>>>>>>>>>>>> that. >>>>>>>>>>>>> >>>>>>>>>>>>> Please let us know if you have any ideas, questions, or >>>>>>>>>>>>> concerns. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Yufei >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Tabular >>>>>>> >>>>>> -- Ryan Blue Tabular