Hi All, The problem I am trying to address is: Store the raw files (files are in xml format and of the size arnd 700MB) in cassandra, later fetch it and process it in hadoop cluster and populate back the processed data in cassandra. Regarding this, I wanted few clarifications:
1) The FAQ ( https://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage) informs that I can have only files of around 64 MB but at the same time talks about about an jira issue https://issues.apache.org/jira/browse/CASSANDRA-16 which is solved in 0.6 version itself. So, in the present version of cassandra (2.0.11), is there any limit on the size of the file in a column and if so, what is it? 2) Can I replace HDFS with Cassandra so that I don't have to sync/fetch the file from cassandra to HDFS when I want to process it in hadoop cluster? Regards, Seenu.