I am using the Hadoop interface with Cassandra. Is it possible to line up partitions or splits of two different column families to be on the same node? I am doing this for data locality reasons. I want to read all the data from a split of column family A and a split from column family B into memory to do some processing.
Here is an example. Column family A has 1,000,000 rows and column family B has 50,000,000 rows. Let say column family A has a split every 10,000 rows and column family B has a split every 500,000 rows. I want the first split of A and the first split of B on same node and the second split of A and second split of B on the next node, and so on. A second scenario is that the two column families use the same key. Lets assume the key is an integer in the range of 1 to 1,000,000. The two column families have a different number of rows. I would like the splits to occur at certain multiples of the key value, say every 10,000. The first split would have keys in the range of 1 to 9999. The second split would have keys in the range of 10,000 to 19,999 and so on. I still want the first split of column family A and the first split of column family B to be on the first node, and so on. It is possible in this scenario that a split could be empty or very small, that is OK. ------------- Sincerely, David G. Boney dbon...@semanticartifacts.com http://www.semanticartifacts.com