We do this all the time.  Take a look at 
http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use 
mapreduce or pig to get data out of cassandra.  If it's going to a separate 
hadoop cluster, I don't think you'd need to co-locate task trackers or data 
nodes on your cassandra nodes - it would just need to copy over the network 
though.  We also use oozie for job scheduling, fwiw.

On Dec 23, 2011, at 9:12 AM, ravikumar visweswara wrote:

> Hello All,
> 
> I have a situation to dump cassandra data to hadoop cluster for further 
> analytics. Lot of other relevant data which is not present in cassandra is 
> already available in hdfs for analysis. Both are independent clusters right 
> now.
> Is there a suggested way to get the data periodically or continuously to HDFS 
> from cassandra? Any ideas or references will be very helpful for me.
> 
> Thanks and Regards
> R

Reply via email to