Maxim, Check out the getLocation() method from this file:
http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java Basically, it loops over the list of nodes containing this split of data and if any of them are the local node, it returns that. Otherwise it returns the first node that contains the data. The code that creates the splits of data and figures out which node each split is located on is here: http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java Ben On Tue, May 18, 2010 at 3:42 AM, Maxim Grinev <ma...@grinev.net> wrote: > > On Tue, May 18, 2010 at 2:23 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> On Mon, May 17, 2010 at 4:12 PM, Vick Khera <vi...@khera.org> wrote: >> > On Mon, May 17, 2010 at 3:46 PM, Jonathan Ellis <jbel...@gmail.com> >> > wrote: >> >> Moving to the user@ list. >> >> >> >> http://wiki.apache.org/cassandra/HadoopSupport should be useful. >> > >> > That document doesn't really answer the "is data locality preserved" >> > when running the map phase, but my hunch is "no". >> >> The answer is, "yes, as long as you have hadoop on all the cassandra >> machines." (the case where it's easy to map cassandra locality to >> hadoop locality :) > > Jonathan, > could you please clarify this. I also cannot understand how it works. Even > if Hadoop is deployed on all the Cassandra machines, how will Hadoop be > aware of Cassandra's data placement (partitioning and replication)? > Maxim > >