Re: Hadoop over Cassandra

Ben Browning Tue, 18 May 2010 04:17:32 -0700

Maxim,

Check out the getLocation() method from this file:


http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java

Basically, it loops over the list of nodes containing this split of
data and if any of them are the local node, it returns that. Otherwise
it returns the first node that contains the data.

The code that creates the splits of data and figures out which node
each split is located on is here:

http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java


Ben

On Tue, May 18, 2010 at 3:42 AM, Maxim Grinev <ma...@grinev.net> wrote:
>
> On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vi...@khera.org> wrote:
>> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbel...@gmail.com>
>> > wrote:
>> >> Moving to the user@ list.
>> >>
>> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful.
>> >
>> > That document doesn't really answer the "is data locality preserved"
>> > when running the map phase, but my hunch is "no".
>>
>> The answer is, "yes, as long as you have hadoop on all the cassandra
>> machines." (the case where it's easy to map cassandra locality to
>> hadoop locality :)
>
> Jonathan,
> could you please clarify this. I also cannot understand how it works. Even
> if Hadoop is deployed on all the Cassandra machines, how will Hadoop be
> aware of Cassandra's data placement (partitioning and replication)?
> Maxim
>
>

Re: Hadoop over Cassandra

Reply via email to