On Wed, Nov 2, 2011 at 7:26 PM, David Jeske wrote:
> - make sure the summarizer does try to do it's job for a batch of counters
> until they are fully replicated and 'static' (no new increments will appear)
>
Apologies. make the summarizer ( doesn't ) try to do it's job...
I understand what you are thinking daniel, but this approach has at least
one big wrinkle. You would be introducing depencencies between compaction
and replication.
The 'unique' idempotent records are required for cassandra to read repair
properly. Therefore, if a compaction (or even a memtable f
Hm - kind of hijacking this but since we have a similar problem I might throw
in my idea:
We need consistent, idempotent counters. On the client side we can create
unique (replayable) keys - like your user ids.
What we want to do is:
- add increment commands as columns such as [prefixByte.uniq
Thanks, good point, splitting wide rows via sharding is a good
optimization for the get_count approach.
On Mon, Oct 31, 2011 at 10:58 AM, Zach Richardson
wrote:
> Ed,
>
> I could be completely wrong about this working--I haven't specifically
> looked at how the counts are executed, but I think th
Ed,
I could be completely wrong about this working--I haven't specifically
looked at how the counts are executed, but I think this makes sense.
You could potentially shard across several rows, based on a hash of
the username combined with the time period as the row key. Run a
count across each r
I'm looking at the scenario of how to keep track of the number of
unique visitors within a given time period. Inserting user ids into a
wide row would allow me to have a list of every user within the time
period that the row represented. My experience in the past was that
using get_count on a row