Thanks for your work on this doc!

Thanks for the proposal for a solution to access buckets with different
layouts through s3g and ofs.  A few points:

In the document it states as a convention to use symbolic links from FSO
buckets to s3v for s3g access.  This is fine.  Buckets created by s3g by
convention should be OBS layout.  However, it also proposes that OBS
buckets created by s3g can be accessed through OFS with symbolic links
between the s3v and an ofs accessed volume.  This currently can’t be done.
Are we proposing to have a linked bucket with src obs layout support file
system semantics when the link is accessed through OFS?



For HCFS applications like map-reduce, spark, trino that access ozone
through both s3g and ofs it appears that FSO with file system semantics is
needed.  Should the applications create buckets through s3g with obs
layout, currently those buckets cannot be accessed through OFS.  If sym
linked obs buckets can be accessed through ofs, still those buckets
relating to directories, tables that are automatically created by apps in
s3g have to be manually linked to a volume to be accessed through OFS.
Thoughts?



Driving a wedge between obs and fso depending on the access type and
porting/migration from HCFS datastore is a great proposal.  Adopting a
default bucket layout convention for buckets created through s3g and ofs to
be OBS and FSO respectively supports this.  To provide easier ports from
Hadoop file systems to ozone and avoid naming issues due to s3 naming
conventions, proposing to port through OFS with FSO layout makes sound
sense.


Regards,

Neil

On Wed, Mar 29, 2023 at 10:04 PM Ritesh Shukla <rit...@apache.org> wrote:

> Hello,
>
> This topic has been an active discussion internally at Cloudera and has
> been a source of confusion while onboarding new customers. Please take a
> look at the attached document.
>
> This document discusses the differences between two bucket layouts in
> Ozone, OBS (OBJECT_STORE) and FSO (FILE_SYSTEM_OPTIMIZED), and proposes a
> solution to address the complexity of accessing volumes through S3 Gateway
> and Hadoop Filesystem. The proposal suggests using symbolic linking to
> expose FSO buckets via S3 Gateway or vice versa and dividing the
> functionality of OBS and FSO based on their compatibility with S3 APIs and
> Hadoop Filesystem. OBS should always be compatible with S3 APIs and have S3
> bucket names, while FSO should always be compatible with Hadoop File System
> interface. The document also explains how to access FSO via S3 APIs and OBS
> via OFS addressing and the benefits of this approach, including transparent
> data sharing and a clear separation between applications using Ozone
> primarily as an S3 store and Hadoop FS-based apps.
>
>
> https://docs.google.com/document/d/1wVlbJX22yw84WowH6I4ni_pUvaxKDr9JLHHsEVOTYSA/edit#
>
> This document can be broken down into tasks that must be done across the
> stack once reviewed.
>
> Regards,
> Ritesh
>


-- 
NJ

Reply via email to