Storing large files for later processing through hadoop

Srinivasa T N Fri, 02 Jan 2015 02:46:59 -0800

Hi All,
   The problem I am trying to address is:  Store the raw files (files are
in xml format and of the size arnd 700MB) in cassandra, later fetch it and
process it in hadoop cluster and populate back the processed data in
cassandra.  Regarding this, I wanted few clarifications:


1) The FAQ (
https://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage) informs
that I can have only files of around 64 MB but at the same time talks about
about an jira issue https://issues.apache.org/jira/browse/CASSANDRA-16
which is solved in 0.6 version itself.  So, in the present version of
cassandra (2.0.11), is there any limit on the size of the file in a column
and if so, what is it?
2) Can I replace HDFS with Cassandra so that I don't have to sync/fetch the
file from cassandra to HDFS when I want to process it in hadoop cluster?

Regards,
Seenu.

Storing large files for later processing through hadoop

Reply via email to