Thanks Hiller and Shamim.
Let me share more details. I want to use cassandra MR to calculate some
KPI's on the data which is stored in cassandra continuously. So here
fetching whole data from cassandra every time seems an overhead to me?
The rowkey I'm using is like "(timestamp/6)_otherid";
You can use Apache PIG to load data and filter it by row key, filter in pig is
very fast.
Regards
Shamim
11.12.2012, 20:46, "Ayush V." :
> I'm working on Cassandra Hadoop intergration (MapReduce). We have used Random
> Partioner to insert data to gain faster write. Now we have to read that data
You may want to look into CQL3 as I hear there may be a way to specify the
query so only those rows are map/reduced. I am not sure if that is out
yet or not but I remember someone from datastax telling me about it.
Dean
On 12/11/12 9:46 AM, "Ayush V." wrote:
>I'm working on Cassandra Hadoop in