Yes, astyanax stores the file in many rows so it reads from many disks giving 
you a performance advantage vs. storing each file in one row….well at least 
from my understanding so read performance "should" be really really good in 
that case.

Dean

From: Michael Kjellman <mkjell...@barracuda.com<mailto:mkjell...@barracuda.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, October 16, 2012 10:07 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Using Cassandra to store binary files?

When we started with Cassandra almost 2 years ago in production originally it 
was for the sole purpose storing blobs in a redundant way. I ignored the 
warnings as my own tests showed it would be okay (and two years later it is 
"ok"). If you plan on using Cassandra later (as we now as as features such as 
secondary indexes and cql have matured I'm now stuck with a large amount of 
data in Cassandra that maybe could be in a better place.) Does it work? Yes. 
Would I do it again? Not 100% sure. Compactions of these column families take 
forever.

Also, by default there is a 16MB limit. Yes, this is adjustable but currently 
Thrift does not stream data. I didn't know that Netflix had worked around this 
(referring to Dean's reply) — I'll have to look through the source to see how 
they are overcoming the limitations of the protocol. Last I read there were no 
plans to make Thrift stream. Looks like there is a bug at 
https://issues.apache.org/jira/browse/CASSANDRA-265

You might want to take a look at the following page: 
http://wiki.apache.org/cassandra/CassandraLimitations

I wanted an easy key value store when I originally picked Cassandra. As our 
project needs changed and Cassandra has now begun playing a more critical role 
as it has matured (since the 0.7 days), in retrospect HDFS might have been a 
better option long term as I really will never need indexing etc on my binary 
blobs and the convenience of simply being able to grab/reassemble a file by 
grabbing it's key was convenient at the time but maybe not the most forward 
thinking. Hope that helps a bit.

Also, your read performance won't be amazing by any means with blobs. Not sure 
if your priority is reads or writes. In our case it was writes so it wasn't a 
large loss.

Best,
michael


From: Vasileios Vlachos 
<vasileiosvlac...@gmail.com<mailto:vasileiosvlac...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, October 16, 2012 8:49 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Using Cassandra to store binary files?

Hello All,

We need to store about 40G of binary files in a redundant way and since we are 
already using Cassandra for other applications we were thinking that we could 
just solve that problem using the same Cassandra cluster. Each individual File 
will be approximately 1MB.

We are thinking that the data structure should be very simple for this case, 
using one CF with just one column which will contain the actual files. The row 
key should then uniquely identify each file. Speed is not an issue when we 
retrieving the files. Impacting other applications using Cassandra is more 
important for us. In order to prevent performance issues with other 
applications using our Cassandra cluster at the moment, we think we should 
disable key_cache and row_cache for this column family.

Anyone tried this before or anyone thinks this is going to be a bad idea? Do 
you think our current plan is sensible? Any input would be much appreciated. 
Thank you in advance.

Regards,

Vasilis

----------------------------------
'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook
  ­­

Reply via email to