Thanks for your work on this doc! Thanks for the proposal for a solution to access buckets with different layouts through s3g and ofs. A few points:
In the document it states as a convention to use symbolic links from FSO buckets to s3v for s3g access. This is fine. Buckets created by s3g by convention should be OBS layout. However, it also proposes that OBS buckets created by s3g can be accessed through OFS with symbolic links between the s3v and an ofs accessed volume. This currently can’t be done. Are we proposing to have a linked bucket with src obs layout support file system semantics when the link is accessed through OFS? For HCFS applications like map-reduce, spark, trino that access ozone through both s3g and ofs it appears that FSO with file system semantics is needed. Should the applications create buckets through s3g with obs layout, currently those buckets cannot be accessed through OFS. If sym linked obs buckets can be accessed through ofs, still those buckets relating to directories, tables that are automatically created by apps in s3g have to be manually linked to a volume to be accessed through OFS. Thoughts? Driving a wedge between obs and fso depending on the access type and porting/migration from HCFS datastore is a great proposal. Adopting a default bucket layout convention for buckets created through s3g and ofs to be OBS and FSO respectively supports this. To provide easier ports from Hadoop file systems to ozone and avoid naming issues due to s3 naming conventions, proposing to port through OFS with FSO layout makes sound sense. Regards, Neil On Wed, Mar 29, 2023 at 10:04 PM Ritesh Shukla <rit...@apache.org> wrote: > Hello, > > This topic has been an active discussion internally at Cloudera and has > been a source of confusion while onboarding new customers. Please take a > look at the attached document. > > This document discusses the differences between two bucket layouts in > Ozone, OBS (OBJECT_STORE) and FSO (FILE_SYSTEM_OPTIMIZED), and proposes a > solution to address the complexity of accessing volumes through S3 Gateway > and Hadoop Filesystem. The proposal suggests using symbolic linking to > expose FSO buckets via S3 Gateway or vice versa and dividing the > functionality of OBS and FSO based on their compatibility with S3 APIs and > Hadoop Filesystem. OBS should always be compatible with S3 APIs and have S3 > bucket names, while FSO should always be compatible with Hadoop File System > interface. The document also explains how to access FSO via S3 APIs and OBS > via OFS addressing and the benefits of this approach, including transparent > data sharing and a clear separation between applications using Ozone > primarily as an S3 store and Hadoop FS-based apps. > > > https://docs.google.com/document/d/1wVlbJX22yw84WowH6I4ni_pUvaxKDr9JLHHsEVOTYSA/edit# > > This document can be broken down into tasks that must be done across the > stack once reviewed. > > Regards, > Ritesh > -- NJ