Re: Hadoop over Cassandra

Stu Hood Tue, 18 May 2010 10:51:31 -0700

The Hadoop integration (as demonstrated by contrib/word_count) is locality 
aware: it begins by querying Cassandra to generate locality aware splits, and 
when the hostnames match up between the Hadoop and Cassandra clusters, the data 
can be mapped locally.


-----Original Message-----
From: "Maxim Grinev" <ma...@grinev.net>
Sent: Tuesday, May 18, 2010 2:42am
To: user@cassandra.apache.org
Subject: Re: Hadoop over Cassandra

On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbel...@gmail.com> wrote:

> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vi...@khera.org> wrote:
> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >> Moving to the user@ list.
> >>
> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful.
> >
> > That document doesn't really answer the "is data locality preserved"
> > when running the map phase, but my hunch is "no".
>
> The answer is, "yes, as long as you have hadoop on all the cassandra
> machines." (the case where it's easy to map cassandra locality to
> hadoop locality :)


Jonathan,

could you please clarify this. I also cannot understand how it works. Even
if Hadoop is deployed on all the Cassandra machines, how will Hadoop be
aware of Cassandra's data placement (partitioning and replication)?

Maxim

Re: Hadoop over Cassandra

Reply via email to