Sadly, I have missed the meeting :( Quick question: Was table rename / location change discussed for tables with relative paths?
AFAIK when a table rename happens then we do not move old data / metadata files, we just change the root location of the new data / metadata files. If I am correct about this then we might need to handle this differently for tables with relative paths. Thanks, Peter On Fri, 13 Aug 2021, 15:12 Anjali Norwood, <anorw...@netflix.com.invalid> wrote: > Perfect, thank you Yufei. > > Regards > Anjali > > On Thu, Aug 12, 2021 at 9:58 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> Hi Anjali, >> >> Inline... >> On Thu, Aug 12, 2021 at 5:31 PM Anjali Norwood >> <anorw...@netflix.com.invalid> wrote: >> >>> Thanks for the summary Yufei. >>> Sorry, if this was already discussed, I missed the meeting yesterday. >>> Is there anything in the design that would prevent multiple roots from >>> being in different aws regions? >>> >> No. DR is the major use case of relative paths, if not the only one. So, >> it will support roots in different regions. >> >> For disaster recovery in the case of an entire aws region down or slow, >>> is metastore still a point of failure or can metastore be stood up in a >>> different region and could select a different root? >>> >> Normally, DR also requires a backup metastore, besides the storage(s3 >> bucket). In that case, the backup metastore will be in a different region >> along with the table files. For example, the primary table is located in >> region A as well as its metastore, the backup table is located in region B >> as well as its metastore. The primary table root points to a path in region >> A, while backup table root points to a path in region B. >> >> >>> regards, >>> Anjali. >>> >>> On Thu, Aug 12, 2021 at 11:35 AM Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> Here is a summary of yesterday's community sync-up. >>>> >>>> >>>> Yufei gave a brief update on disaster recovery requirements and the >>>> current progress of relative path approach. >>>> >>>> >>>> Ryan: We all agreed that relative path is the way for disaster recovery. >>>> >>>> >>>> *Multiple roots for the relative path* >>>> >>>> Ryan proposed an idea to enable multiple roots for a table, basically, >>>> we can add a list of roots in table metadata, and use a selector to choose >>>> different roots when we move the table from one place to another. The >>>> selector reads a property to decide which root to use. The property could >>>> be either from catalog or the table metadata, which is yet to be decided. >>>> >>>> >>>> Here is an example I’d image: >>>> >>>> 1. Root1: hdfs://nn:8020/path/to/the/table >>>> 2. Root2: s3://bucket1/path/to/the/table >>>> 3. Root3: s3://bucket2/path/to/the/table >>>> >>>> *Relative path use case* >>>> >>>> We brainstormed use cases for relative paths. Please let us know if >>>> there are any other use cases. >>>> >>>> 1. Disaster Recovery >>>> 2. Jack: AWS s3 bucket alias >>>> 3. Ryan: fall-back use case. In case that the root1 doesn’t work, >>>> the table falls back to root2, then root3. As Russell mentioned, it is >>>> challenging to do snapshot expiration and other table maintenance >>>> actions. >>>> >>>> >>>> *Timeline* >>>> >>>> In terms of timeline, relative path could be a feature in Spec V3, >>>> since Spec V1 and V2 assume absolute path in metadata. >>>> >>>> >>>> *Misc* >>>> >>>> 1. Miao: How is the relative path compatible with the absolute path? >>>> >>>> 2. How do we migrate an existing table? Build a tool for that. >>>> >>>> Please let us know if you have any ideas, questions, or concerns. >>>> >>>> >>>> >>>> Yufei >>>> >>>