We do this all the time. Take a look at http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use mapreduce or pig to get data out of cassandra. If it's going to a separate hadoop cluster, I don't think you'd need to co-locate task trackers or data nodes on your cassandra nodes - it would just need to copy over the network though. We also use oozie for job scheduling, fwiw.
On Dec 23, 2011, at 9:12 AM, ravikumar visweswara wrote: > Hello All, > > I have a situation to dump cassandra data to hadoop cluster for further > analytics. Lot of other relevant data which is not present in cassandra is > already available in hdfs for analysis. Both are independent clusters right > now. > Is there a suggested way to get the data periodically or continuously to HDFS > from cassandra? Any ideas or references will be very helpful for me. > > Thanks and Regards > R
