On Wed, Dec 10, 2014 at 12:20 PM, Ari King <ari.brandeis.k...@gmail.com> wrote: > Hi, > > I'm doing a research paper on Hadoop -- specifically relating to its > dependency on HDFS. I need to determine if and how HDFS can be replaced. As > I understand it, there are a number of organizations that have produced > HDFS alternatives that support the Hadoop ecosystem, i.e. MapReduce, Hive, > HBase, etc.
There's a difference between producing a storage solution with on-the-wire-protocol compatible with HDFS vs. an HCFS one (see bellow). > With the "if" part being answered, I'd appreciate insight/guidance on the > "how" part. Essentially, where can I find information on what MapReduce and > the other Hadoop subprojects require of the underlying file system and how > these subprojects expect to interact with the file system. It really boils down for a storage solution to expose a Hadoop Compatible Filesystem API. This should give you a sufficient overview of the details: https://wiki.apache.org/hadoop/HCFS A lot of open source (Ceph, GlusterFS, etc.) and closed source storage solutions (Isilon, etc.) do that and can be used as a replacement for HDFS. The real question, of course, is all the different tradeoffs that the implementations are making. That's where it gets fascinating. Thanks, Roman.