If anyone is interested there is a great talk from Jonathan Ellis on
the topic of Hadoop & Cassandra (interviewed yesterday)
http://wp.me/pTu1i-40

I never knew that Pig was supported and I must say it is pretty kewl
that you can run Pig scripts against your Cassandra data.

It is a podcast so grab your headphones and enjoy.

If anyone has "war stories" on the topic of Cassandra & Hadoop (or
even just Hadoop in general) let me know.

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
*/


On Tue, May 18, 2010 at 1:51 PM, Stu Hood <stu.h...@rackspace.com> wrote:
> The Hadoop integration (as demonstrated by contrib/word_count) is locality 
> aware: it begins by querying Cassandra to generate locality aware splits, and 
> when the hostnames match up between the Hadoop and Cassandra clusters, the 
> data can be mapped locally.
>
> -----Original Message-----
> From: "Maxim Grinev" <ma...@grinev.net>
> Sent: Tuesday, May 18, 2010 2:42am
> To: user@cassandra.apache.org
> Subject: Re: Hadoop over Cassandra
>
> On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>
>> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vi...@khera.org> wrote:
>> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbel...@gmail.com>
>> wrote:
>> >> Moving to the user@ list.
>> >>
>> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful.
>> >
>> > That document doesn't really answer the "is data locality preserved"
>> > when running the map phase, but my hunch is "no".
>>
>> The answer is, "yes, as long as you have hadoop on all the cassandra
>> machines." (the case where it's easy to map cassandra locality to
>> hadoop locality :)
>
>
> Jonathan,
>
> could you please clarify this. I also cannot understand how it works. Even
> if Hadoop is deployed on all the Cassandra machines, how will Hadoop be
> aware of Cassandra's data placement (partitioning and replication)?
>
> Maxim
>
>
>

Reply via email to