I am using the Hadoop interface with Cassandra. Is it possible to line up 
partitions or splits of two different column families to be on the same node? I 
am doing this for data locality reasons. I want to read all the data from a 
split of column family A and a split from column family B into memory to do 
some processing.

Here is an example. Column family A has 1,000,000 rows and column family B has 
50,000,000 rows. Let say column family A has a split every 10,000 rows and 
column family B has a split every 500,000 rows. I want the first split of A and 
the first split of B on same node and the second split of A and second split of 
B on the next node, and so on. 

A second scenario is that the two column families use the same key. Lets assume 
the key is an integer in the range of 1 to 1,000,000. The two column families 
have a different number of rows. I would like the splits to occur at certain 
multiples of the key value, say every 10,000. The first split would have keys 
in the range of 1 to 9999. The second split would have keys in the range of 
10,000 to 19,999 and so on. I still want the first split of column family A and 
the first split of column family B to be on the first node, and so on. It is 
possible in this scenario that a split could be empty or very small, that is OK.
-------------
Sincerely,
David G. Boney
dbon...@semanticartifacts.com
http://www.semanticartifacts.com




Reply via email to