Re: data model for unique users in a time period

David Jeske Wed, 02 Nov 2011 19:26:53 -0700

I understand what you are thinking daniel, but this approach has at least
one big wrinkle. You would be introducing  depencencies between compaction
and replication.


The 'unique' idempotent records are required for cassandra to read repair
properly. Therefore, if a compaction (or even a memtable flush) occurred,
the system could no longer read repair the counters. Your strategy is
closer to how bt/hbase handles accumulators, but it works because in that
system there is a single consistient write log.

Here is a different approach to doing this with cassandra...

- use timestamps in the column uniqueness

- Don't try to use custom compaction. Instead, layer counter summarization
on top as a periodic summarization job.

- make sure the summarizer does try to do it's job for a batch of counters
until they are fully replicated and 'static' (no new increments will appear)

- write the 'summary' of a bunch of unique timestamps in a way that anyone
summing the values know the existance of a summary means they should ignore
individual values for the range (because it will take time for them to be
deleted)

Re: data model for unique users in a time period

Reply via email to