It would be a lot of work. Of course there is a lot of overlap, but they have different use cases, so there are significant differences. From the big data side, there are a lot of blockers.
1. CVFS does not have the concept of replication, so there is no way to get or set a file's replication. 2. It doesn't look like CVFS supports appending to files. 3. CVFS doesn't support data locality. 4. CVFS positioned reads are difficult/inefficient. The equivalent of file.readFully(seekPos, buffer, offset, length) is 1. FileContent fc = file.getContent(); 2. RandomAccessContent random = fc.getRandomAccessContent(); 3. random.seek(seekPos); 4. InputStream stream = random.getInputStream() 5. loop until stream.read(buffer, offset, length) returns enough bytes. .. Owen On Tue, Mar 10, 2020 at 3:57 PM David Mollitor <dam6...@gmail.com> wrote: > I just see a lot of overlap and doubling of effort here. Would be nice if > we can all be working in tandem. > > On Tue, Mar 10, 2020, 6:36 PM Aaron Fabbri <ajfab...@gmail.com> wrote: > > > It is a good question. I'm not familiar with Apache commons VFS (which I > > assume you are talking about, versus the BSD/Unix VFS layer). There no > > doubt will be semantic differences between Hadoop FS interface and VFS. > It > > would be an interesting exercise to implement a connector that bridges > the > > gap, running a Hadoop FileSystem etc. on top of VFS libraries. Anyone > else > > looked at this or have experience with Apache VFS? > > > > On Fri, Feb 28, 2020 at 6:42 AM David Mollitor <dam6...@gmail.com> > wrote: > > > >> Hello, > >> > >> I'm curious to know what the history of Hadoop File API is in > relationship > >> to VFS. Hadoop supports several file schemes and so does VFS. Why are > >> there two projects working on this same effort and what are the > pros/cons > >> of each? > >> > >> Thanks. > >> > > >