Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-27 Thread Bryan Whitehead
Just a warning about ZFS. If the plan is to use JBOD w/RAID-Z, don't. 3, 4, 5, ... or N disks in a RAID-Z array (using ZFS) will result in read performance equivalent to only 1 disk. Check out this blog entry: http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance The secon

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-27 Thread aaron morton
> Some possibilities open up when using OPP, especially with aggregate > keys. This is more of an option when RF==cluster size, but not > necessarily a good reason to make RF=cluster size if you haven't > already. This use of the OOP sounds like the way Lucandra stores data, they want to have ran

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-26 Thread Jonathan Shook
Some possibilities open up when using OPP, especially with aggregate keys. This is more of an option when RF==cluster size, but not necessarily a good reason to make RF=cluster size if you haven't already. For example, ':' and ';' make good boundary markers in aggregate keys, since they are alread

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-26 Thread Michael Widmann
Okay . That really made a knot into my brain - It twist's a little bit now I've to draw that on the whiteboard to understand it better ... but I've seen some very interesting cornerstones in your answer for our project. really thanks a lot mike 2010/7/26 aaron morton > I see, got carried away

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-26 Thread aaron morton
I see, got carried away thinking about it so here are some thoughts Your access patterns will determine the best storage design, so it's probably not the best solution. I would welcome thoughts from others. => Standard CF: Chunks * key is chunk hash * col named 'data' col value is chunk d

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-25 Thread Michael Widmann
Hi Wow that was lot of information... Think about users storing files online (means with their customer name) - each customer maintains his own "hashtable" of files. Each File can consist of some or several thousand entries (depends on the size of the whole file). for example: File Test.doc c

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-25 Thread Aaron Morton
Some background reading.. http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/Not sure on your follow up question, so I'll just wildly blather on about things :)My assumption of your data is you have 64K chunks that are identified by a hash, which can so

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-25 Thread Michael Widmann
Thanks for this detailed description ... You mentioned the secondary index in a standard column, would it be better to build several indizes? Is that even possible to build a index on for example 32 columns? The hint with the smaller boxes is very valuable! Mike 2010/7/26 Aaron Morton > For w

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-25 Thread Aaron Morton
For what it's worth...* Many smaller boxes with local disk storage are preferable to 2 with huge NAS storage. * To cache the hash values look at the KeysCached setting in the storage-config* There are some row size limits see http://wiki.apache.org/cassandra/CassandraLimitations* If you wanted to g

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-24 Thread Michael Widmann
Hi Peter We try to figure that out how much data is coming in to cassandra once in full operation mode Reads are more depending on the hash values (the file name) for the binary blobs - not the binary data itself We will try to store hash values "grouped" (based on their first byte (a-z,A-Z,0-9)

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-23 Thread Michael Widmann
Hi Jonathan Thanks for your very valuable input on this. I maybe didn't enough explanation - so I'll try to clarify Here are some thoughts: - binary data will not be indexed - only stored. - The file name to the binary data (a hash) should be indexed for search - We could group the ha

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-23 Thread Peter Schuller
> We plan to use cassandra as a data storage on at least 2 nodes with RF=2 > for about 1 billion small files. > We do have about 48TB discspace behind for each node. > > now my question is - is this possible with cassandra - reliable - means > (every blob is stored on 2 jbods).. > > we may grow up

Re: Cassandra to store 1 billion small 64KB Blobs

2010-07-23 Thread Jonathan Shook
There are two scaling factors to consider here. In general the worst case growth of operations in Cassandra is kept near to O(log2(N)). Any worse growth would be considered a design problem, or at least a high priority target for improvement. This is important for considering the load generated by

Cassandra to store 1 billion small 64KB Blobs

2010-07-23 Thread Michael Widmann
Hi We plan to use cassandra as a data storage on at least 2 nodes with RF=2 for about 1 billion small files. We do have about 48TB discspace behind for each node. now my question is - is this possible with cassandra - reliable - means (every blob is stored on 2 jbods).. we may grow up to nearly

Cassandra to store 1 billion small 64KB Blobs

2010-07-23 Thread Michael Widmann
Hi We plan to use cassandra as a data storage on at least 2 nodes with RF=2 for about 1 billion small files. We do have about 48TB discspace behind for each node. now my question is - is this possible with cassandra - reliable - means (every blob is stored on 2 jbods).. we may grow up to nearly