During a sync with Yufei and Anurag I had some thought on this proposal that I wanted to share with the wider group. As Yufei has perviously noted, I'm worried about the alternative configuration parameters like (folder-storage, object-storage). Specifically i'm thinking about the issue of moving a table from S3 to HDFS and vice versa. The table may be using table-location and object-storage path on S3 so we need something that correctly relatives those paths as well.
I was considering that rather than holding "table-location" as our root path prefix we use a separate "prefix" parameter which is used on all relative path options. For example, instead of storing locations 1: S3://bucket/tableLocation 2: HDFS://clusterx/tableLocation We store prefixes 1: S3://Bucket 2: HDFS://clusterx Then whenever object-storage is set with a relative path we prefix this before writing. The files will be stored with the prefix and be marked in the metadata using only the path relative to the prefix and not the absolute path. This lets us move our data between buckets or to an HDFS cluster while also being able to use the object-storage path. > On Sep 17, 2021, at 6:03 PM, Anurag Mantripragada > <amantriprag...@apple.com.INVALID> wrote: > > Hi everyone, > > > Thanks for sharing your ideas and suggestions on this thread. I believe we > have consensus on supporting multiple roots for a table and storing relative > paths in metadata. We can start by adding this support in the initial phase. > Yufei and I have updated the design doc[1] with these details of this initial > support. Please share your feedback. > > In the next phase, we can look at complex use-cases and support them as > needed. > > [1] - > https://docs.google.com/document/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit#heading=h.hxmtkjthp8hm > > Thanks, > Anurag > >