Re: Cassandra to store 1 billion small 64KB Blobs

Michael Widmann Sun, 25 Jul 2010 22:30:21 -0700

Hi

Wow that was lot of information...


Think about users storing files online (means with their customer name) -
each customer maintains his own "hashtable" of files. Each File can consist
of some or several thousand entries (depends on the size of the whole file).


for example:

File Test.doc  consists of 3 * 64K Blobs  - each blob does have the hash
value ABC - so we will only store one blob, one hash value and the entry
that this blob is needed 3 times, so we try to avoid duplicate file data
(and there's a lot of duplicates)

Means our modell is:

Filename:
Test.doc
Hashes: 3
Hash 1: ABC
Hash 2: ABC
Hash 3: ABC
BLOB:ABC
 Used: 3
 Binary: Data 1*

Each customer does have:

Customer:  Customer:ID

Filename: Test.doc
 MetaData (Path / Accessed / modified / Size / compression / OS-Type from /
Security)
 Version: 0
 Hash 1 / Hash 2 / Hash 3

Filename: Another.doc

  MetaData (Path / Accessed / modified / Size / compression / OS-Type from /
Security)
  Version: 0
  Hash 1 / Hash 2 / Hash 3
  Version: 1
  Hash 1 / Hash 2 / Hash 4
  Version: 2
  Hash 3 / Hash 2 / Hash 2

Hope this clear some things :-)

Mike


2010/7/26 Aaron Morton <aa...@thelastpickle.com>

> Some background reading..
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>
> Not sure on your follow up question, so I'll just wildly blather on about
> things :)
>
> My assumption of your data is you have 64K chunks that are identified by a
> hash, which can somehow be grouped together into larger files (so there is a
> "file name" of sorts).
>
> One possible storage design (assuming the Random Partitioner) is....
>
> A Chunks CF, each row in this CF uses the hash of the chunk as it's key and
> has is a single column with the chunk data. You could use more columns to
> store meta here.
>
> A ChunkIndex CF, each row uses the file name (from above) as the key and
> has one column for each chunk in the file. The column name *could* be an
> offset for the chunk and the column value could be the hash for the chunk.
> Or you could use the chunk hash as the col name and the offset as the col
> value if needed.
>
> To rebuild the file read the entire row from the ChunkIndex, then make a
> series of multi gets to read all the chunks. Or you could lazy populate the
> ones you needed.
>
> This is all assuming that the 1000's comment below means you could want to
> combine the chunks  60+ MB chunks. It would be easier to keep all the chunks
> together in one row, if you are going to have large (unbounded) file size
> this may not be appropriate.
>
> You could also think about using the order preserving partitioner, and
> using a compound key for each row such as "file_name_hash.offset" . Then by
> using the get_range_slices to scan the range of chunks for a file you would
> not need to maintain a secondary index. Some drawbacks to that approach,
> read the article above.
>
> Hope the helps
> Aaron
>
>
>
> On 26 Jul, 2010,at 04:01 PM, Michael Widmann <michael.widm...@gmail.com>
> wrote:
>
> Thanks for this detailed description ...
>
> You mentioned the secondary index in a standard column, would it be better
> to build several indizes?
> Is that even possible to build a index on for example 32 columns?
>
> The hint with the smaller boxes is very valuable!
>
> Mike
>
> 2010/7/26 Aaron Morton <aa...@thelastpickle.com>
>
>> For what it's worth...
>>
>> * Many smaller boxes with local disk storage are preferable to 2 with huge
>> NAS storage.
>> * To cache the hash values look at the KeysCached setting in the
>> storage-config
>> * There are some row size limits see
>> http://wiki.apache.org/cassandra/CassandraLimitations
>> * If you wanted to get 1000 blobs, rather then group them in a single row
>> using a super column consider building a secondary index in a standard
>> column. One CF for the blobs using your hash, one CF that uses whatever they
>> grouping key is with a col for every blobs hash value. Read from the index
>> first, then from the blobs themselves.
>>
>> Aaron
>>
>>
>>
>> On 24 Jul, 2010,at 06:51 PM, Michael Widmann <michael.widm...@gmail.com>
>> wrote:
>>
>> Hi Jonathan
>>
>> Thanks for your very valuable input on this.
>>
>> I maybe didn't enough explanation - so I'll try to clarify
>>
>> Here are some thoughts:
>>
>>
>>    - binary data will not be indexed - only stored.
>>    - The file name to the binary data (a hash) should be indexed for
>>    search
>>    - We could group the hashes in 62 "entry" points for search retrieving
>>    -> i think suprcolumns (If I'm right in terms) (a-z,A_Z,0-9)
>>    - the 64k Blobs meta data (which one belong to which file) should be
>>    stored separate in cassandra
>>    - For Hardware we rely on solaris / opensolaris with ZFS in the
>>    backend
>>    - Write operations occur much more often than reads
>>    - Memory should hold the hash values mainly for fast search (not the
>>    binary data)
>>    - Read Operations (restore from cassandra) may be async - (get about
>>    1000 Blobs) - group them restore
>>
>> So my question is too:
>>
>> 2 or 3 Big boxes or 10 till 20 small boxes for storage...
>> Could we separate "caching" - hash values CFs cashed and indexed - binary
>> data CFs not ...
>> Writes happens around the clock - on not that tremor speed but constantly
>> Would compaction of the database need really much disk space
>> Is it reliable on this size (more my fear)
>>
>> thx for thinking and answers...
>>
>> greetings
>>
>> Mike
>>
>> 2010/7/23 Jonathan Shook <jsh...@gmail.com>
>>
>>> There are two scaling factors to consider here. In general the worst
>>> case growth of operations in Cassandra is kept near to O(log2(N)). Any
>>> worse growth would be considered a design problem, or at least a high
>>> priority target for improvement.  This is important for considering
>>> the load generated by very large column families, as binary search is
>>> used when the bloom filter doesn't exclude rows from a query.
>>> O(log2(N)) is basically the best achievable growth for this type of
>>> data, but the bloom filter improves on it in some cases by paying a
>>> lower cost every time.
>>>
>>> The other factor to be aware of is the reduction of binary search
>>> performance for datasets which can put disk seek times into high
>>> ranges. This is mostly a direct consideration for those installations
>>> which will be doing lots of cold reads (not cached data) against large
>>> sets. Disk seek times are much more limited (low) for adjacent or near
>>> tracks, and generally much higher when tracks are sufficiently far
>>> apart (as in a very large data set). This can compound with other
>>> factors when session times are longer, but that is to be expected with
>>> any system. Your storage system may have completely different
>>> characteristics depending on caching, etc.
>>>
>>> The read performance is still quite high relative to other systems for
>>> a similar data set size, but the drop-off in performance may be much
>>> worse than expected if you are wanting it to be linear. Again, this is
>>> not unique to Cassandra. It's just an important consideration when
>>> dealing with extremely large sets of data, when memory is not likely
>>> to be able to hold enough hot data for the specific application.
>>>
>>> As always, the real questions have lots more to do with your specific
>>> access patterns, storage system, etc I would look at the benchmarking
>>> info available on the lists as a good starting point.
>>>
>>>
>>> On Fri, Jul 23, 2010 at 11:51 AM, Michael Widmann
>>> <michael.widm...@gmail.com> wrote:
>>> > Hi
>>> >
>>> > We plan to use cassandra as a data storage on at least 2 nodes with
>>> RF=2
>>> > for about 1 billion small files.
>>> > We do have about 48TB discspace behind for each node.
>>> >
>>> > now my question is - is this possible with cassandra - reliable - means
>>> > (every blob is stored on 2 jbods)..
>>> >
>>> > we may grow up to nearly 40TB or more on cassandra "storage" data ...
>>> >
>>> > anyone out did something similar?
>>> >
>>> > for retrieval of the blobs we are going to index them with an hashvalue
>>> > (means hashes are used to store the blob) ...
>>> > so we can search fast for the entry in the database and combine the
>>> blobs to
>>> > a normal file again ...
>>> >
>>> > thanks for answer
>>> >
>>> > michael
>>> >
>>>
>>
>>
>>
>> --
>> bayoda.com - Professional Online Backup Solutions for Small and Medium
>> Sized Companies
>>
>>
>
>
> --
> bayoda.com - Professional Online Backup Solutions for Small and Medium
> Sized Companies
>
>


-- 
bayoda.com - Professional Online Backup Solutions for Small and Medium Sized
Companies

Re: Cassandra to store 1 billion small 64KB Blobs

Reply via email to