Hello,

yes by default VFS offers a Input/OutputStream based interface to the
FileContent and a RandomAccess interface (which is specific to VFS).

I think the current HDFS provider (VFS2.1) does support only those two
(ReadOnly for the Random Access).

I am not sure if you can wrap one of the two into a RCFile or if you
can use that only with reah HDFS FileSystem objects (not familiar with
Hadoop).

There is the possibility to add extensions (operations). One possible
extension would be to retrieve the underlying HDFS file (or a Object
implementing the record based interface).

That is certainly the way to go if you need that kind of access,
however, if you want such specific HDFS access modes, I wonder if it
isnt the best to use HDFS only/directly? What is the motivation for
wrap it into VFS?

BTW: there was some interest in VFS on the HDFS developer mailinglist a
few weeks back. If you plan to do anything in that direction, you might
involve them as well.

I copy the commons-dev list, since I am not familiar with the HDFS
provider and also it is a general discussion.

Greetings
Bernd


 Am
Mon, 28 Jul 2014 15:57:57 +0530 schrieb Richards Peter
<hbkricha...@gmail.com>:

> Hi Bernd,
> 
> I would like to clarify one more doubt. I found that commons-vfs is
> implemented based on java.io.*. Commons-vfs returns
> java.io.InputStream/java.io.OutputStream for reading/writing  files.
> 
> I have a use case to read/write files from/to hdfs. These files may be
> txt(csv) or RCFiles(Record Columnar Files, using hive apis). Handling
> txt files is straight forward. I can wrap the InputStream and
> OutputStream to some reader and read the contents.
> 
> However for RCFiles I have to use:
> https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/io/RCFile.Writer.html
> https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/io/RCFile.Reader.html
> 
> In these classes, the methods exposed to write and read contents are
> not based on java input and output streams, but  append() and
> getCurrentRow() apis, both of which requires BytesRefArrayWritable
> objects.
> 
> I think my use case is more related to the file content format,
> reader and writer. What would you recommend me in this scenario to
> read from and write to such files? Should I just hold the file object
> implementation reference in my own reader and writer classes and
> create RC File reader and writer instances within those classes? Can
> something else be done using commons-vfs to read from and write to
> files irrespective of the contents(eg: FileContent, FileContentInfo
> and FileContentInfoFactory)?
> 
> Thanks,
> Richards Peter.
> 
> 
> On Mon, Jul 28, 2014 at 12:36 PM, Richards Peter
> <hbkricha...@gmail.com> wrote:
> 
> > Hi Bernd,
> >
> > Thanks for your response.
> >
> > Our company does not allow the development team to use
> > candidate/snapshot releases of open-source projects. That is the
> > reason why I am checking about vfs 2.0 version.
> >
> > I am checking the code available in:
> >
> > http://svn.apache.org/viewvc/commons/proper/vfs/trunk/core/src/main/java/org/apache/commons/vfs2/provider/hdfs/
> > and
> >
> > https://github.com/pentaho/pentaho-hdfs-vfs/tree/master/src/org/pentaho/hdfs/vfs
> >
> > I would also like to check whether it is fine if I try to clarify my
> > doubts with you through this mail thread if I face any problems
> > while implementing hdfs file system for vfs-2.0. I will also check
> > vfs-2.1 and see whether I can contribute to that as well.
> >
> > Regards,
> > Richards Peter.
> >
> >
> > On Mon, Jul 28, 2014 at 1:17 AM, Bernd Eckenfels
> > <e...@zusammenkunft.net> wrote:
> >
> >> Hello Peter,
> >>
> >> I would suggest you use the
> >> current Version from CVS or the snapshot builds. This would have
> >> the big advantage that you can actually test and contribute to
> >> this version in case you miss some features or find some bugs.
> >>
> >> If you want to implement your own file system provider, you
> >> typically start to copy one of the existing providers and adopt
> >> it. The main work is one in Implementing a specific FileObject
> >> which extends from AbstractFileSystemObject and implements all the
> >> various doSomething() methods.
> >>
> >> Actually the JavaDoc of that Abstract Object is quite good in this
> >> regard.
> >>
> >> After you have implemented the new Filesystem, it will be
> >> available for addProvider() or you can add it as a new provider to
> >> the xml configuration of StandardFileSystemManager like described
> >> here: http://commons.apache.org/proper/commons-vfs/api.html
> >>
> >> Greetings
> >> Bernd
> >>
> >> Am Sun, 27 Jul 2014 15:24:57 +0530
> >> schrieb Richards Peter <hbkricha...@gmail.com>:
> >>
> >> > Hi,
> >> >
> >> > I am evaluating commons-vfs 2.0 for one of my use cases. I read
> >> > that commons--vfs 2.1 has a file system implementation for HDFS.
> >> > Since commons-vfs 2.1 is still in development and commons-vfs
> >> > 2.1 does not have all capabilities that we require for hdfs, I
> >> > would like to implement a custom file system with commons vfs
> >> > 2.0 now and enchance commons-vfs 2.1 when that release is made.
> >> >
> >> > Could you please tell me how to implement a such a file system
> >> > for commons-vfs 2.0? I would like to know:
> >> > 1. The specific classes that need to be implemented.
> >> > 2. How to register/supply these classes so that it can be used
> >> > by my application?
> >> > 3. How the name resolution takes place when I provide a filepath
> >> > of hdfs file?
> >> >
> >> > Thanks,
> >> > Richards Peter.
> >>
> >>
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to