+1 thanks Marton for the details/background, it really helps to answer all
the questions.

On Fri, Mar 12, 2021 at 12:57 PM Elek, Marton <e...@apache.org> wrote:

>
> If we simplify the picture, the two biggest Apache Ozone advantages
> compared with HDFS are the following (IMHO!):
>
>   1. better scalability: it can handle billion of files
>   2. better interface support: it can be used from multiple interfaces
> not only from Hadoop compatible interfaces (S3, CSI)
>
>
> I think the second is equally important with the first. Ozone can be
> used not only from Hadoop compatible tools like Spark and Hive but also
> from and S3 compatible data science or ML tool, or (via Fuse file
> system) from Yarn or Kubernetes containers.
>
>
> There is a well-known slide about this which is used wildly (at least by
> me): big Ozone logo with smaller Hadoop/AWS/K8s logos. It's used in all
> the Ozone videos, other conference presentations and part of the
> official documentation: https://ozone.apache.org/docs/1.0.0/
>
> It was also used when Apache Ozone was showed at Cloud Native conf for
> non-Hadoop user audience
>
>
>
> Let's look the CSI feature more closer:
>
> CSI nothing more just a very lightweight interface which can receive
> requests from container orchestrator to create storage (creating bucket
> in our case) and can receive requests to mount it.
>
> (for more information about CSI, check this video:
>
>
> https://www.youtube.com/watch?v=xQwXnuVr8hc&list=PLCaV-jpCBO8UK5Ged2A_iv3eHuozzMsYv&index=10&t=387s
> )
>
>
> The hard part is the the CSI interface, the hard part is mounting.
>
> How can I mount Ozone buckets:
>
>
> Using Ozone (or at least HDDS) as some kind of block store was always
> part of our vision:
>
>   * HDFS-11118 showed how is it possible to mount huge HDDS containers
> (with jscsi) as ext4 file system
>
> This worked very well, but didn't merged back to Hadoop trunk together
> with the other parts and it had one big limitations: the containers are
> used as raw, storage backend, and files were not visible via other
> interfaces (S3 or ofs/o3fs)
>
>
> To fix this there were multiple experiments:
>
>   * Try to use libhdfs based fuse file system for Ozone (HDDS-3352)
>   * Try to support NFS based on Hadoop NFS support (HDDS-3001)
>
> And (as we have proper s3 compatible endpoint) we also tried to use S3
> compatible fuse file systems. We tested goofys, fixed incompatibilities,
> and it worked well.
>
> But long term, the most effective solution would be a native fuse driver
> (a prototype can be found at https://github.com/elek/ozone-go and we had
> an agreement to move it to Apache Ozone repository).
>
>
>
> So Ozone has a simple but working CSI support today which supports CSI
> requests and mount command is configurable. Default value is goofys but
> there other options, for example https://github.com/s3fs-fuse/s3fs-fuse
> or https://github.com/archiecobbs/s3backer
>
> You can use any of the available fuse drivers based on your requirements
> / environments.
>
>
>
> Recently we had a debate with Arpit about the documentation of CSI
> (https://issues.apache.org/jira/browse/HDDS-4904).
>
>
>
> Arpit claims that we should remove the documentation of CSI driver
> because Goofys (one of the available implementations) is not production
> ready.
>
>
> I have strong concerns against it:
>
>   * Goofys is just one possible configuration value, any other drivers
> can be used as mount implementation
>
>   * As we have this feature implemented it should be documented
>
>   * It's important part of Ozone selling points and we already shared it
> with the wider community
>
>   * Even today it can be used with the right choice of S3 fuse driver.
>
>   * Default settings may or may not be acceptable in production (depends
> if you need strict POSIX compatibility in your prod env or not)
>
>
>
> I suggest instead to CLEARLY DOCUMENT the state of the CSI and what kind
> of guarantees can be expected and what are the risks (and what are the
> long-term plans):
>
> (my suggested patch is here:
>
>
> https://github.com/elek/ozone/commit/e56b23499686ce5e90c65285099445e5ee0a935f
>
>
> with update image:
>
> https://github.com/elek/ozone/blob/csi-alpha/hadoop-hdds/docs/static/ozone-usage.png
> )
>
>
> Please let me know, what is your opinion,
>
> Thanks a lot
> Marton
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@ozone.apache.org
> For additional commands, e-mail: dev-h...@ozone.apache.org
>
>

Reply via email to