Yes, astyanax stores the file in many rows so it reads from many disks giving you a performance advantage vs. storing each file in one row….well at least from my understanding so read performance "should" be really really good in that case.
Dean From: Michael Kjellman <mkjell...@barracuda.com<mailto:mkjell...@barracuda.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Tuesday, October 16, 2012 10:07 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Using Cassandra to store binary files? When we started with Cassandra almost 2 years ago in production originally it was for the sole purpose storing blobs in a redundant way. I ignored the warnings as my own tests showed it would be okay (and two years later it is "ok"). If you plan on using Cassandra later (as we now as as features such as secondary indexes and cql have matured I'm now stuck with a large amount of data in Cassandra that maybe could be in a better place.) Does it work? Yes. Would I do it again? Not 100% sure. Compactions of these column families take forever. Also, by default there is a 16MB limit. Yes, this is adjustable but currently Thrift does not stream data. I didn't know that Netflix had worked around this (referring to Dean's reply) — I'll have to look through the source to see how they are overcoming the limitations of the protocol. Last I read there were no plans to make Thrift stream. Looks like there is a bug at https://issues.apache.org/jira/browse/CASSANDRA-265 You might want to take a look at the following page: http://wiki.apache.org/cassandra/CassandraLimitations I wanted an easy key value store when I originally picked Cassandra. As our project needs changed and Cassandra has now begun playing a more critical role as it has matured (since the 0.7 days), in retrospect HDFS might have been a better option long term as I really will never need indexing etc on my binary blobs and the convenience of simply being able to grab/reassemble a file by grabbing it's key was convenient at the time but maybe not the most forward thinking. Hope that helps a bit. Also, your read performance won't be amazing by any means with blobs. Not sure if your priority is reads or writes. In our case it was writes so it wasn't a large loss. Best, michael From: Vasileios Vlachos <vasileiosvlac...@gmail.com<mailto:vasileiosvlac...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Tuesday, October 16, 2012 8:49 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Using Cassandra to store binary files? Hello All, We need to store about 40G of binary files in a redundant way and since we are already using Cassandra for other applications we were thinking that we could just solve that problem using the same Cassandra cluster. Each individual File will be approximately 1MB. We are thinking that the data structure should be very simple for this case, using one CF with just one column which will contain the actual files. The row key should then uniquely identify each file. Speed is not an issue when we retrieving the files. Impacting other applications using Cassandra is more important for us. In order to prevent performance issues with other applications using our Cassandra cluster at the moment, we think we should disable key_cache and row_cache for this column family. Anyone tried this before or anyone thinks this is going to be a bad idea? Do you think our current plan is sensible? Any input would be much appreciated. Thank you in advance. Regards, Vasilis ---------------------------------- 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook