During a sync with Yufei and Anurag I had some thought on this proposal that I 
wanted to share with the wider group. As Yufei has perviously noted, I'm 
worried about the alternative configuration parameters like (folder-storage, 
object-storage). Specifically i'm thinking about the issue of moving a table 
from S3 to HDFS and vice versa. The table may be using table-location and 
object-storage path on S3 so we need something that correctly relatives those 
paths as well.

I was considering that rather than holding "table-location" as our root path 
prefix we use a separate "prefix" parameter which is used on all relative path 
options. For example, instead of storing locations

1: S3://bucket/tableLocation
2: HDFS://clusterx/tableLocation

We store prefixes

1: S3://Bucket
2: HDFS://clusterx

Then whenever object-storage is set with a relative path we prefix this before 
writing. The files will be stored with the prefix and be marked in the metadata 
using only the path relative to the prefix and not the absolute path.  This 
lets us move our data between buckets or to an HDFS cluster while also being 
able to use the object-storage path.

> On Sep 17, 2021, at 6:03 PM, Anurag Mantripragada 
> <amantriprag...@apple.com.INVALID> wrote:
> 
> Hi everyone, 
> 
> 
> Thanks for sharing your ideas and suggestions on this thread. I believe we 
> have consensus on supporting multiple roots for a table and storing relative 
> paths in metadata. We can start by adding this support in the initial phase. 
> Yufei and I have updated the design doc[1] with these details of this initial 
> support. Please share your feedback.
> 
> In the next phase, we can look at complex use-cases and support them as 
> needed.
> 
> [1] - 
> https://docs.google.com/document/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit#heading=h.hxmtkjthp8hm
> 
> Thanks, 
> Anurag
> 
> 

Reply via email to