I need to store one trillion data points. The data is highly compressible down to 1 byte per data point using simple custom compression combined with standard dictionary compression. What's the most space-efficient way to store the data in Cassandra? How much per-row overhead is there if I store one data point per row?
The data is particularly hard to group. It's a large number of time series with highly variable density. That makes it hard to pack subsets of the data into meaningful column families / wide rows. Is there a table layout scheme that would allow me to approach the 1B per data point without forcing me to implement complex abstraction layer on application level?