Hello Users,

I am planning a system where both metadata and data will be stored.  Usually
it will be small file such as word documents along with some specific data
about the file.  Sometimes, there will be a large file, possibly a few
hundred meg - a gig such as video.  I have read a lot about suggested
methods for large file storage within Cassandra, but I want to verify my
thoughts on the method of implementation before I start working on it.

On June 29, 2009 Jonathan listed the task on JIRA (
https://issues.apache.org/jira/browse/CASSANDRA-265) - but closed it stating
that it was not on anyone's roadmap

On April 26, 2010 there was a posting to this group stating "During
compaction, as is well noted, Cassandra needs the entire row in memory,
which will cause a FAIL once you have files more than a few gigs." Shuge
Lee.

Currently, the Wiki has an entry explaining the handling, or more
appropriately, workaround to handle Large BLOBs (
http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage).

Seeing as native support for large files is not expected, and the Wiki
states that files <= 64MB can easily be stored within the database and
knowing that during compaction, the entire row will be loaded into memory...


1) Is the appropriate way to handle files that greatly vary in size (1KB to
a few GB) to break the data into smaller "chunks" and then store those
chunks each into a seperate row?
    A) If so, how should it be done to accomplish the best read/write
results?
    B) Is there a row size that should be considered a "sweet spot" or
should it be able to be modified on a per cluster basis?
2) Does anyone forsee large blob support in the coming future?

Thanks,

- Lucas

Reply via email to