Hello, Thank you all for your responses.
Performance is not an issue at all as I described, so it shouldn't be problematic. At least this is our current understanding. We will try it and post back if something interesting comes up. Many thanks. Regards, Vasilis On Tue, Oct 16, 2012 at 7:34 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote: > I am not sure. If I were to implement it myself though, I would have > probably... > > postfixed the rows with 1,2,3,4,...<lastValue> and then stored the lastValue > in the first row so then my program knows all the rows. > > Ie. Not sure an index is really needed in that case. > > Dean > > On 10/16/12 11:45 AM, "Michael Kjellman" <mkjell...@barracuda.com> wrote: > > >Ah, so they just wrote chunking into Astyanax? Do they create an index > >somewhere so they know how to reassemble the file on the way out? > > > >On 10/16/12 10:36 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote: > > > >>Yes, astyanax stores the file in many rows so it reads from many disks > >>giving you a performance advantage vs. storing each file in one row....well > >>at least from my understanding so read performance "should" be really > >>really good in that case. > >> > >>Dean > >> > >>From: Michael Kjellman > >><mkjell...@barracuda.com<mailto:mkjell...@barracuda.com>> > >>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > >>Date: Tuesday, October 16, 2012 10:07 AM > >>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > >>Subject: Re: Using Cassandra to store binary files? > >> > >>When we started with Cassandra almost 2 years ago in production > >>originally it was for the sole purpose storing blobs in a redundant way. > >>I ignored the warnings as my own tests showed it would be okay (and two > >>years later it is "ok"). If you plan on using Cassandra later (as we now > >>as as features such as secondary indexes and cql have matured I'm now > >>stuck with a large amount of data in Cassandra that maybe could be in a > >>better place.) Does it work? Yes. Would I do it again? Not 100% sure. > >>Compactions of these column families take forever. > >> > >>Also, by default there is a 16MB limit. Yes, this is adjustable but > >>currently Thrift does not stream data. I didn't know that Netflix had > >>worked around this (referring to Dean's reply) -- I'll have to look > >>through the source to see how they are overcoming the limitations of the > >>protocol. Last I read there were no plans to make Thrift stream. Looks > >>like there is a bug at > >>https://issues.apache.org/jira/browse/CASSANDRA-265 > >> > >>You might want to take a look at the following page: > >>http://wiki.apache.org/cassandra/CassandraLimitations > >> > >>I wanted an easy key value store when I originally picked Cassandra. As > >>our project needs changed and Cassandra has now begun playing a more > >>critical role as it has matured (since the 0.7 days), in retrospect HDFS > >>might have been a better option long term as I really will never need > >>indexing etc on my binary blobs and the convenience of simply being able > >>to grab/reassemble a file by grabbing it's key was convenient at the time > >>but maybe not the most forward thinking. Hope that helps a bit. > >> > >>Also, your read performance won't be amazing by any means with blobs. Not > >>sure if your priority is reads or writes. In our case it was writes so it > >>wasn't a large loss. > >> > >>Best, > >>michael > >> > >> > >>From: Vasileios Vlachos > >><vasileiosvlac...@gmail.com<mailto:vasileiosvlac...@gmail.com>> > >>Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > >>Date: Tuesday, October 16, 2012 8:49 AM > >>To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" > >><user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > >>Subject: Using Cassandra to store binary files? > >> > >>Hello All, > >> > >>We need to store about 40G of binary files in a redundant way and since > >>we are already using Cassandra for other applications we were thinking > >>that we could just solve that problem using the same Cassandra cluster. > >>Each individual File will be approximately 1MB. > >> > >>We are thinking that the data structure should be very simple for this > >>case, using one CF with just one column which will contain the actual > >>files. The row key should then uniquely identify each file. Speed is not > >>an issue when we retrieving the files. Impacting other applications using > >>Cassandra is more important for us. In order to prevent performance > >>issues with other applications using our Cassandra cluster at the moment, > >>we think we should disable key_cache and row_cache for this column > >>family. > >> > >>Anyone tried this before or anyone thinks this is going to be a bad idea? > >>Do you think our current plan is sensible? Any input would be much > >>appreciated. Thank you in advance. > >> > >>Regards, > >> > >>Vasilis > >> > >>---------------------------------- > >>'Like' us on Facebook for exclusive content and other resources on all > >>Barracuda Networks solutions. > >>Visit http://barracudanetworks.com/facebook > >> -- > > > > > >'Like' us on Facebook for exclusive content and other resources on all > >Barracuda Networks solutions. > >Visit http://barracudanetworks.com/facebook > > > > > >