When you say merge cells, do you mean re-aggregating the data into courser
time buckets?

On Thu, Aug 4, 2016 at 5:59 AM Michael Burman <mibur...@redhat.com> wrote:

> Hi,
>
> Considering the following example structure:
>
> CREATE TABLE data (
> metric text,
> value double,
> time timestamp,
> PRIMARY KEY((metric), time)
> ) WITH CLUSTERING ORDER BY (time DESC)
>
> The natural inserting order is metric, value, timestamp pairs, one
> metric/value pair per second for example. That means creating more and more
> cells to the same partition, which creates a large amount of overhead and
> reduces the compression ratio of LZ4 & Deflate (LZ4 reaches ~0.26 and
> Deflate ~0.10 ratios in some of the examples I've run). Now, to improve
> compression ratio, how could I merge the cells on the actual Cassandra
> node? I looked at ICompress and it provides only byte-level compression.
>
> Could I do this on the compaction phase, by extending the
> DateTieredCompaction for example? It has SSTableReader/Writer facilities
> and it seems to be able to see the rows? I'm fine with the fact that repair
> run might have to do some conflict resolution as the final merged rows
> would be quite "small" (50kB) in size. The naive approach is of course to
> fetch all the rows from Cassandra - merge them on the client and send back
> to the Cassandra, but this seems very wasteful and has its own problems.
> Compared to table-LZ4 I was able to reduce the required size to 1/20th
> (context-aware compression is sometimes just so much better) so there are
> real benefits to this approach, even if I would probably violate multiple
> design decisions.
>
> One approach is of course to write to another storage first and once the
> blocks are ready, write them to Cassandra. But that again seems idiotic (I
> know some people are using Kafka in front of Cassandra for example, but
> that means maintaining yet another distributed solution and defeats the
> benefit of Cassandra's easy management & scalability).
>
> Has anyone done something similar? Even planned? If I need to extend
> something in Cassandra I can accept that approach also - but as I'm not
> that familiar with Cassandra source code I could use some hints.
>
>   - Micke
>

Reply via email to