> Is it advisable or ok to store photos, images and docs in cassandra where you > expect high volume of uploads and views?
To diverge a bit from the direction the thread is going: You can definitely store large files in Cassandra. I would recommend against doing so by simply smacking entire files into column values simple because the architecture is such that columns are assumed to be reasonably sized (lots of them fitting in memory, lots of temporary columns are okay to create, etc). Off the top of my head my starting point would be using one row per file and splitting the actual content up into columns. For dealing with larger files you may wish to consider splitting into multiple rows so that even individual files can get replicated across a cluster (avoids single very large files causing out-of-disk or performance problems on an individual node, and allows an individual file to enjoy scaling out for performance). However, all that is just deciding on the representation of data in Cassandra appropriately for the use case. I think the more real and bigger issue is what you're looking for in terms of efficiency. I wouldn't necessarily call Cassandra the most efficient way to store large blobs, just because compaction will be a lot more expensive in relative terms than when used for small individual items of data. However on the other hand Cassandra should shine in giving you reasonably efficient random access to subranges of files, yet allow you to easily write file data in a non-coordinated fashion (concurrency across sub ranges). There are non-trivial trade-offs. If you were to store say predominantly 5-50 MB files and you had no desire beyond just storing them as single large blobs, a local storage model which implied one-file-per-per would be much more efficient assuming each individual blob could be streamed to the client. Bottom line, I think the two primary potential concerns would be: Are you looking at a *lot* of writes? Write overhead in terms of throughput and disk I/O should be larger than for your typical database with small "things" (regardless of row/column/supercolumn division) being written. The other thing is that if compaction becomes I/O bound rather than disk bound, you may have bigger issues with read latency than otherwise. Regardless, I don't think focusing on whether or not it's a good idea to have a huge single column is the right approach to the problem since that's more about using the Cassandra data model appropriately. -- / Peter Schuller