It would be a lot of work. Of course there is a lot of overlap, but they
have different use cases, so there are significant differences. From the
big data side, there are a lot of blockers.

   1. CVFS does not have the concept of replication, so there is no way to
   get or set a file's replication.
   2. It doesn't look like CVFS supports appending to files.
   3. CVFS doesn't support data locality.
   4. CVFS positioned reads are difficult/inefficient. The equivalent of
   file.readFully(seekPos, buffer, offset, length) is
   1. FileContent fc = file.getContent();
      2. RandomAccessContent random = fc.getRandomAccessContent();
      3. random.seek(seekPos);
      4. InputStream stream = random.getInputStream()
      5. loop until stream.read(buffer, offset, length) returns enough
      bytes.

.. Owen

On Tue, Mar 10, 2020 at 3:57 PM David Mollitor <dam6...@gmail.com> wrote:

> I just see a lot of overlap and doubling of effort here.  Would be nice if
> we can all be working in tandem.
>
> On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri <ajfab...@gmail.com> wrote:
>
> > It is a good question. I'm not familiar with Apache commons VFS (which I
> > assume you are talking about, versus the BSD/Unix VFS layer). There no
> > doubt will be semantic differences between Hadoop FS interface and VFS.
> It
> > would be an interesting exercise to implement a connector that bridges
> the
> > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone
> else
> > looked at this or have experience with Apache VFS?
> >
> > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor <dam6...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> I'm curious to know what the history of Hadoop File API is in
> relationship
> >> to VFS.  Hadoop supports several file schemes and so does VFS.  Why are
> >> there two projects working on this same effort and what are the
> pros/cons
> >> of each?
> >>
> >> Thanks.
> >>
> >
>

Reply via email to