If anyone is interested there is a great talk from Jonathan Ellis on the topic of Hadoop & Cassandra (interviewed yesterday) http://wp.me/pTu1i-40
I never knew that Pig was supported and I must say it is pretty kewl that you can run Pig scripts against your Cassandra data. It is a podcast so grab your headphones and enjoy. If anyone has "war stories" on the topic of Cassandra & Hadoop (or even just Hadoop in general) let me know. /* Joe Stein http://www.linkedin.com/in/charmalloc */ On Tue, May 18, 2010 at 1:51 PM, Stu Hood <stu.h...@rackspace.com> wrote: > The Hadoop integration (as demonstrated by contrib/word_count) is locality > aware: it begins by querying Cassandra to generate locality aware splits, and > when the hostnames match up between the Hadoop and Cassandra clusters, the > data can be mapped locally. > > -----Original Message----- > From: "Maxim Grinev" <ma...@grinev.net> > Sent: Tuesday, May 18, 2010 2:42am > To: user@cassandra.apache.org > Subject: Re: Hadoop over Cassandra > > On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > >> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vi...@khera.org> wrote: >> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbel...@gmail.com> >> wrote: >> >> Moving to the user@ list. >> >> >> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful. >> > >> > That document doesn't really answer the "is data locality preserved" >> > when running the map phase, but my hunch is "no". >> >> The answer is, "yes, as long as you have hadoop on all the cassandra >> machines." (the case where it's easy to map cassandra locality to >> hadoop locality :) > > > Jonathan, > > could you please clarify this. I also cannot understand how it works. Even > if Hadoop is deployed on all the Cassandra machines, how will Hadoop be > aware of Cassandra's data placement (partitioning and replication)? > > Maxim > > >