Re: Minimum row size / minimum data point size

Robert Važan Fri, 04 Oct 2013 00:55:57 -0700

That spreadsheet doesn't take compression into account, which is veryimportant in my case. Uncompressed, my data is going to require apetabyte of storage according to the spreadsheet. I am pretty sure Iwon't get that much storage to play with.

The spreadsheet also shows that Cassandra wastes unbelievable amount ofspace on compaction. My experiments with LevelDB however show that it ispossible for write-optimized database to use negligible compactionspace. I am not sure how LevelDB does it. I guess it splits the largersstables into smaller chunks and merges them incrementally.

Anyway, does anybody know how densely can I store the data withCassandra when compression is enabled? Would I have to implement somesmart adaptive grouping to fit lots of records in one row or is there asimpler solution?


Dňa 4. 10. 2013 1:56 Andrey Ilinykh wrote / napísal(a):

It may help.
https://docs.google.com/spreadsheet/ccc?key=0Atatq_AL3AJwdElwYVhTRk9KZF9WVmtDTDVhY0xPSmc#gid=0

On Thu, Oct 3, 2013 at 1:31 PM, Robert Važan <robert.va...@gmail.com<mailto:robert.va...@gmail.com>> wrote:


    I need to store one trillion data points. The data is highly
    compressible down to 1 byte per data point using simple custom
    compression combined with standard dictionary compression. What's
    the most space-efficient way to store the data in Cassandra? How
    much per-row overhead is there if I store one data point per row?

    The data is particularly hard to group. It's a large number of
    time series with highly variable density. That makes it hard to
    pack subsets of the data into meaningful column families / wide
    rows. Is there a table layout scheme that would allow me to
    approach the 1B per data point without forcing me to implement
    complex abstraction layer on application level?

Re: Minimum row size / minimum data point size

Reply via email to