Hi Russell, 

I don’t have see any major issues with your approach other than that it may 
break some custimizability of locations. If I understand correctly, today 
write.object-storage.path or write.metadata.path can be outside of the table 
base location. With your suggestion, are we saying that data and metadata must 
always reside inside the prefix? We can do that but for S3 locations, users 
will have to make sure the prefix stays small.

Another point that Yufei mentioned is that this will assume that the folder 
layout after the prefix must remain same for S3 and HDFS. This may not be true 
in real world. 


Regards, 
Anurag

> On Sep 22, 2021, at 9:46 AM, Russell Spitzer <russell.spit...@gmail.com> 
> wrote:
> 
> During a sync with Yufei and Anurag I had some thought on this proposal that 
> I wanted to share with the wider group. As Yufei has perviously noted, I'm 
> worried about the alternative configuration parameters like (folder-storage, 
> object-storage). Specifically i'm thinking about the issue of moving a table 
> from S3 to HDFS and vice versa. The table may be using table-location and 
> object-storage path on S3 so we need something that correctly relatives those 
> paths as well.
> 
> I was considering that rather than holding "table-location" as our root path 
> prefix we use a separate "prefix" parameter which is used on all relative 
> path options. For example, instead of storing locations
> 
> 1: S3://bucket/tableLocation
> 2: HDFS://clusterx/tableLocation
> 
> We store prefixes
> 
> 1: S3://Bucket
> 2: HDFS://clusterx
> 
> Then whenever object-storage is set with a relative path we prefix this 
> before writing. The files will be stored with the prefix and be marked in the 
> metadata using only the path relative to the prefix and not the absolute 
> path.  This lets us move our data between buckets or to an HDFS cluster while 
> also being able to use the object-storage path.
> 
>> On Sep 17, 2021, at 6:03 PM, Anurag Mantripragada 
>> <amantriprag...@apple.com.INVALID> wrote:
>> 
>> Hi everyone, 
>> 
>> 
>> Thanks for sharing your ideas and suggestions on this thread. I believe we 
>> have consensus on supporting multiple roots for a table and storing relative 
>> paths in metadata. We can start by adding this support in the initial phase. 
>> Yufei and I have updated the design doc[1] with these details of this 
>> initial support. Please share your feedback.
>> 
>> In the next phase, we can look at complex use-cases and support them as 
>> needed.
>> 
>> [1] - 
>> https://docs.google.com/document/d/1RDEjJAVEXg1csRzyzTuM634L88vvI0iDHNQQK3kOVR0/edit#heading=h.hxmtkjthp8hm
>> 
>> Thanks, 
>> Anurag
>> 
>> 
> 

Reply via email to