+1 thanks Marton for the details/background, it really helps to answer all the questions.
On Fri, Mar 12, 2021 at 12:57 PM Elek, Marton <e...@apache.org> wrote: > > If we simplify the picture, the two biggest Apache Ozone advantages > compared with HDFS are the following (IMHO!): > > 1. better scalability: it can handle billion of files > 2. better interface support: it can be used from multiple interfaces > not only from Hadoop compatible interfaces (S3, CSI) > > > I think the second is equally important with the first. Ozone can be > used not only from Hadoop compatible tools like Spark and Hive but also > from and S3 compatible data science or ML tool, or (via Fuse file > system) from Yarn or Kubernetes containers. > > > There is a well-known slide about this which is used wildly (at least by > me): big Ozone logo with smaller Hadoop/AWS/K8s logos. It's used in all > the Ozone videos, other conference presentations and part of the > official documentation: https://ozone.apache.org/docs/1.0.0/ > > It was also used when Apache Ozone was showed at Cloud Native conf for > non-Hadoop user audience > > > > Let's look the CSI feature more closer: > > CSI nothing more just a very lightweight interface which can receive > requests from container orchestrator to create storage (creating bucket > in our case) and can receive requests to mount it. > > (for more information about CSI, check this video: > > > https://www.youtube.com/watch?v=xQwXnuVr8hc&list=PLCaV-jpCBO8UK5Ged2A_iv3eHuozzMsYv&index=10&t=387s > ) > > > The hard part is the the CSI interface, the hard part is mounting. > > How can I mount Ozone buckets: > > > Using Ozone (or at least HDDS) as some kind of block store was always > part of our vision: > > * HDFS-11118 showed how is it possible to mount huge HDDS containers > (with jscsi) as ext4 file system > > This worked very well, but didn't merged back to Hadoop trunk together > with the other parts and it had one big limitations: the containers are > used as raw, storage backend, and files were not visible via other > interfaces (S3 or ofs/o3fs) > > > To fix this there were multiple experiments: > > * Try to use libhdfs based fuse file system for Ozone (HDDS-3352) > * Try to support NFS based on Hadoop NFS support (HDDS-3001) > > And (as we have proper s3 compatible endpoint) we also tried to use S3 > compatible fuse file systems. We tested goofys, fixed incompatibilities, > and it worked well. > > But long term, the most effective solution would be a native fuse driver > (a prototype can be found at https://github.com/elek/ozone-go and we had > an agreement to move it to Apache Ozone repository). > > > > So Ozone has a simple but working CSI support today which supports CSI > requests and mount command is configurable. Default value is goofys but > there other options, for example https://github.com/s3fs-fuse/s3fs-fuse > or https://github.com/archiecobbs/s3backer > > You can use any of the available fuse drivers based on your requirements > / environments. > > > > Recently we had a debate with Arpit about the documentation of CSI > (https://issues.apache.org/jira/browse/HDDS-4904). > > > > Arpit claims that we should remove the documentation of CSI driver > because Goofys (one of the available implementations) is not production > ready. > > > I have strong concerns against it: > > * Goofys is just one possible configuration value, any other drivers > can be used as mount implementation > > * As we have this feature implemented it should be documented > > * It's important part of Ozone selling points and we already shared it > with the wider community > > * Even today it can be used with the right choice of S3 fuse driver. > > * Default settings may or may not be acceptable in production (depends > if you need strict POSIX compatibility in your prod env or not) > > > > I suggest instead to CLEARLY DOCUMENT the state of the CSI and what kind > of guarantees can be expected and what are the risks (and what are the > long-term plans): > > (my suggested patch is here: > > > https://github.com/elek/ozone/commit/e56b23499686ce5e90c65285099445e5ee0a935f > > > with update image: > > https://github.com/elek/ozone/blob/csi-alpha/hadoop-hdds/docs/static/ozone-usage.png > ) > > > Please let me know, what is your opinion, > > Thanks a lot > Marton > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org > For additional commands, e-mail: dev-h...@ozone.apache.org > >