On Wed, Dec 10, 2014 at 12:20 PM, Ari King <ari.brandeis.k...@gmail.com> wrote:
> Hi,
>
> I'm doing a research paper on Hadoop -- specifically relating to its
> dependency on HDFS. I need to determine if and how HDFS can be replaced. As
> I understand it, there are a number of organizations that have produced
> HDFS alternatives that support the Hadoop ecosystem, i.e. MapReduce, Hive,
> HBase, etc.

There's a difference between producing a storage solution with
on-the-wire-protocol compatible with HDFS vs. an HCFS one (see
bellow).

> With the "if" part being answered, I'd appreciate insight/guidance on the
> "how" part. Essentially, where can I find information on what MapReduce and
> the other Hadoop subprojects require of the underlying file system and how
> these subprojects expect to interact with the file system.

It really boils down for a storage solution to expose a Hadoop Compatible
Filesystem API. This should give you a sufficient overview of the details:
    https://wiki.apache.org/hadoop/HCFS

A lot of open source (Ceph, GlusterFS, etc.) and closed source storage solutions
(Isilon, etc.) do that and can be used as a replacement for HDFS.

The real question, of course, is all the different tradeoffs that the
implementations
are making. That's where it gets fascinating.

Thanks,
Roman.

Reply via email to