Hello Users, I am planning a system where both metadata and data will be stored. Usually it will be small file such as word documents along with some specific data about the file. Sometimes, there will be a large file, possibly a few hundred meg - a gig such as video. I have read a lot about suggested methods for large file storage within Cassandra, but I want to verify my thoughts on the method of implementation before I start working on it.
On June 29, 2009 Jonathan listed the task on JIRA ( https://issues.apache.org/jira/browse/CASSANDRA-265) - but closed it stating that it was not on anyone's roadmap On April 26, 2010 there was a posting to this group stating "During compaction, as is well noted, Cassandra needs the entire row in memory, which will cause a FAIL once you have files more than a few gigs." Shuge Lee. Currently, the Wiki has an entry explaining the handling, or more appropriately, workaround to handle Large BLOBs ( http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage). Seeing as native support for large files is not expected, and the Wiki states that files <= 64MB can easily be stored within the database and knowing that during compaction, the entire row will be loaded into memory... 1) Is the appropriate way to handle files that greatly vary in size (1KB to a few GB) to break the data into smaller "chunks" and then store those chunks each into a seperate row? A) If so, how should it be done to accomplish the best read/write results? B) Is there a row size that should be considered a "sweet spot" or should it be able to be modified on a per cluster basis? 2) Does anyone forsee large blob support in the coming future? Thanks, - Lucas