I am +1 on your proposed changes. It makes it clear upfront that the support is 
incomplete and uses fuse s3.

Thanks for putting up the patch.


> On Mar 12, 2021, at 9:57 AM, Elek, Marton <e...@apache.org> wrote:
> 
> 
> If we simplify the picture, the two biggest Apache Ozone advantages compared 
> with HDFS are the following (IMHO!):
> 
> 1. better scalability: it can handle billion of files
> 2. better interface support: it can be used from multiple interfaces not only 
> from Hadoop compatible interfaces (S3, CSI)
> 
> 
> I think the second is equally important with the first. Ozone can be used not 
> only from Hadoop compatible tools like Spark and Hive but also from and S3 
> compatible data science or ML tool, or (via Fuse file system) from Yarn or 
> Kubernetes containers.
> 
> 
> There is a well-known slide about this which is used wildly (at least by me): 
> big Ozone logo with smaller Hadoop/AWS/K8s logos. It's used in all the Ozone 
> videos, other conference presentations and part of the official 
> documentation: https://ozone.apache.org/docs/1.0.0/
> 
> It was also used when Apache Ozone was showed at Cloud Native conf for 
> non-Hadoop user audience
> 
> 
> 
> Let's look the CSI feature more closer:
> 
> CSI nothing more just a very lightweight interface which can receive requests 
> from container orchestrator to create storage (creating bucket in our case) 
> and can receive requests to mount it.
> 
> (for more information about CSI, check this video:
> 
> https://www.youtube.com/watch?v=xQwXnuVr8hc&list=PLCaV-jpCBO8UK5Ged2A_iv3eHuozzMsYv&index=10&t=387s)
> 
> 
> The hard part is the the CSI interface, the hard part is mounting.
> 
> How can I mount Ozone buckets:
> 
> 
> Using Ozone (or at least HDDS) as some kind of block store was always part of 
> our vision:
> 
> * HDFS-11118 showed how is it possible to mount huge HDDS containers (with 
> jscsi) as ext4 file system
> 
> This worked very well, but didn't merged back to Hadoop trunk together with 
> the other parts and it had one big limitations: the containers are used as 
> raw, storage backend, and files were not visible via other interfaces (S3 or 
> ofs/o3fs)
> 
> 
> To fix this there were multiple experiments:
> 
> * Try to use libhdfs based fuse file system for Ozone (HDDS-3352)
> * Try to support NFS based on Hadoop NFS support (HDDS-3001)
> 
> And (as we have proper s3 compatible endpoint) we also tried to use S3 
> compatible fuse file systems. We tested goofys, fixed incompatibilities, and 
> it worked well.
> 
> But long term, the most effective solution would be a native fuse driver (a 
> prototype can be found at https://github.com/elek/ozone-go and we had an 
> agreement to move it to Apache Ozone repository).
> 
> 
> 
> So Ozone has a simple but working CSI support today which supports CSI 
> requests and mount command is configurable. Default value is goofys but there 
> other options, for example https://github.com/s3fs-fuse/s3fs-fuse
> or https://github.com/archiecobbs/s3backer
> 
> You can use any of the available fuse drivers based on your requirements / 
> environments.
> 
> 
> 
> Recently we had a debate with Arpit about the documentation of CSI 
> (https://issues.apache.org/jira/browse/HDDS-4904).
> 
> 
> 
> Arpit claims that we should remove the documentation of CSI driver because 
> Goofys (one of the available implementations) is not production ready.
> 
> 
> I have strong concerns against it:
> 
> * Goofys is just one possible configuration value, any other drivers can be 
> used as mount implementation
> 
> * As we have this feature implemented it should be documented
> 
> * It's important part of Ozone selling points and we already shared it with 
> the wider community
> 
> * Even today it can be used with the right choice of S3 fuse driver.
> 
> * Default settings may or may not be acceptable in production (depends if you 
> need strict POSIX compatibility in your prod env or not)
> 
> 
> 
> I suggest instead to CLEARLY DOCUMENT the state of the CSI and what kind of 
> guarantees can be expected and what are the risks (and what are the long-term 
> plans):
> 
> (my suggested patch is here:
> 
> https://github.com/elek/ozone/commit/e56b23499686ce5e90c65285099445e5ee0a935f 
> 
> with update image: 
> https://github.com/elek/ozone/blob/csi-alpha/hadoop-hdds/docs/static/ozone-usage.png)
> 
> 
> Please let me know, what is your opinion,
> 
> Thanks a lot
> Marton
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
> For additional commands, e-mail: dev-h...@ozone.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
For additional commands, e-mail: dev-h...@ozone.apache.org

Reply via email to