Just a warning about ZFS. If the plan is to use JBOD w/RAID-Z, don't.
3, 4, 5, ... or N disks in a RAID-Z array (using ZFS) will result in
read performance equivalent to only 1 disk.
Check out this blog entry:
http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance
The secon
> Some possibilities open up when using OPP, especially with aggregate
> keys. This is more of an option when RF==cluster size, but not
> necessarily a good reason to make RF=cluster size if you haven't
> already.
This use of the OOP sounds like the way Lucandra stores data, they
want to have ran
Some possibilities open up when using OPP, especially with aggregate
keys. This is more of an option when RF==cluster size, but not
necessarily a good reason to make RF=cluster size if you haven't
already.
For example, ':' and ';' make good boundary markers in aggregate keys,
since they are alread
Okay . That really made a knot into my brain - It twist's a little bit now
I've to draw that on the whiteboard to understand it better ... but I've
seen some very interesting cornerstones in your answer
for our project.
really thanks a lot
mike
2010/7/26 aaron morton
> I see, got carried away
I see, got carried away thinking about it so here are some thoughts
Your access patterns will determine the best storage design, so it's probably
not the best solution. I would welcome thoughts from others.
=> Standard CF: Chunks
* key is chunk hash
* col named 'data' col value is chunk d
Hi
Wow that was lot of information...
Think about users storing files online (means with their customer name) -
each customer maintains his own "hashtable" of files. Each File can consist
of some or several thousand entries (depends on the size of the whole file).
for example:
File Test.doc c
Some background reading.. http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/Not sure on your follow up question, so I'll just wildly blather on about things :)My assumption of your data is you have 64K chunks that are identified by a hash, which can so
Thanks for this detailed description ...
You mentioned the secondary index in a standard column, would it be better
to build several indizes?
Is that even possible to build a index on for example 32 columns?
The hint with the smaller boxes is very valuable!
Mike
2010/7/26 Aaron Morton
> For w
For what it's worth...* Many smaller boxes with local disk storage are preferable to 2 with huge NAS storage. * To cache the hash values look at the KeysCached setting in the storage-config* There are some row size limits see http://wiki.apache.org/cassandra/CassandraLimitations* If you wanted to g
Hi Peter
We try to figure that out how much data is coming in to cassandra once in
full operation mode
Reads are more depending on the hash values (the file name) for the binary
blobs - not the binary data itself
We will try to store hash values "grouped" (based on their first byte
(a-z,A-Z,0-9)
Hi Jonathan
Thanks for your very valuable input on this.
I maybe didn't enough explanation - so I'll try to clarify
Here are some thoughts:
- binary data will not be indexed - only stored.
- The file name to the binary data (a hash) should be indexed for search
- We could group the ha
> We plan to use cassandra as a data storage on at least 2 nodes with RF=2
> for about 1 billion small files.
> We do have about 48TB discspace behind for each node.
>
> now my question is - is this possible with cassandra - reliable - means
> (every blob is stored on 2 jbods)..
>
> we may grow up
There are two scaling factors to consider here. In general the worst
case growth of operations in Cassandra is kept near to O(log2(N)). Any
worse growth would be considered a design problem, or at least a high
priority target for improvement. This is important for considering
the load generated by
Hi
We plan to use cassandra as a data storage on at least 2 nodes with RF=2
for about 1 billion small files.
We do have about 48TB discspace behind for each node.
now my question is - is this possible with cassandra - reliable - means
(every blob is stored on 2 jbods)..
we may grow up to nearly
Hi
We plan to use cassandra as a data storage on at least 2 nodes with RF=2
for about 1 billion small files.
We do have about 48TB discspace behind for each node.
now my question is - is this possible with cassandra - reliable - means
(every blob is stored on 2 jbods)..
we may grow up to nearly
15 matches
Mail list logo