Re: Cassandra to store 1 billion small 64KB Blobs

aaron morton Tue, 27 Jul 2010 03:31:48 -0700

> Some possibilities open up when using OPP, especially with aggregate
> keys. This is more of an option when RF==cluster size, but not
> necessarily a good reason to make RF=cluster size if you haven't
> already.


This use of the OOP sounds like the way Lucandra stores data, they 
want to have range scans and some random key distribution. 

http://github.com/tjake/Lucandra

See the hash_key() function in CassandraUtils.java for how they manually hash 
the key before storing it in cassandra. 


> 64MB per row, 1MB columns
> customerABC:file123:00000000 (colnames: 00000000, 00100000, 00200000, ...)
> customerABC:file123:04000000 (colnames: 04000000, 04100000, ... )
> if 0xFFFFFFFF is not enough for the file size (4,294,967,295), then
> you can start with 10 or 12 digits instead (up to 2.8e+14)

Grouping together chunks into larger groups/extents is an interesting idea. 
You could have a 'read ahead' buffer.  I'm 
sure somewhere in all these designs there is a magical balance between row size 
and 
the number of rows. They were saying chunks with the same has should only 
be stored once though, so not sure if it's applicable in this case. 

> If you needed to add metadata to chunk groups/chunks, you can use
> column names which are disjoint from '0'-'F', as long as your API
> knows how to set your predicates up likewise. If there is at least one
> column name which is dependable in each chunk row, then you can use it
> as your predicate for "what's out there" queries. This avoids loading
> column data for the chunks when looking up names (row/file/... names).
> On the other hand, if you use an empty predicate, there is not an easy
> way to avoid tombstone rows unless you make another trip to Cassandra
> to verify.

I've experimented with name spacing columns before, and found easier to 
use a super CF in the long run.

Cheers
Aaron

Re: Cassandra to store 1 billion small 64KB Blobs

Reply via email to