The Hadoop integration (as demonstrated by contrib/word_count) is locality 
aware: it begins by querying Cassandra to generate locality aware splits, and 
when the hostnames match up between the Hadoop and Cassandra clusters, the data 
can be mapped locally.

-----Original Message-----
From: "Maxim Grinev" <ma...@grinev.net>
Sent: Tuesday, May 18, 2010 2:42am
To: user@cassandra.apache.org
Subject: Re: Hadoop over Cassandra

On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbel...@gmail.com> wrote:

> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vi...@khera.org> wrote:
> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >> Moving to the user@ list.
> >>
> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful.
> >
> > That document doesn't really answer the "is data locality preserved"
> > when running the map phase, but my hunch is "no".
>
> The answer is, "yes, as long as you have hadoop on all the cassandra
> machines." (the case where it's easy to map cassandra locality to
> hadoop locality :)


Jonathan,

could you please clarify this. I also cannot understand how it works. Even
if Hadoop is deployed on all the Cassandra machines, how will Hadoop be
aware of Cassandra's data placement (partitioning and replication)?

Maxim


Reply via email to