Re: data model for unique users in a time period

2011-11-03 Thread David Jeske
On Wed, Nov 2, 2011 at 7:26 PM, David Jeske wrote: > - make sure the summarizer does try to do it's job for a batch of counters > until they are fully replicated and 'static' (no new increments will appear) > Apologies. make the summarizer ( doesn't ) try to do it's job...

Re: data model for unique users in a time period

2011-11-02 Thread David Jeske
I understand what you are thinking daniel, but this approach has at least one big wrinkle. You would be introducing depencencies between compaction and replication. The 'unique' idempotent records are required for cassandra to read repair properly. Therefore, if a compaction (or even a memtable f

Re: data model for unique users in a time period

2011-11-01 Thread Daniel Doubleday
Hm - kind of hijacking this but since we have a similar problem I might throw in my idea: We need consistent, idempotent counters. On the client side we can create unique (replayable) keys - like your user ids. What we want to do is: - add increment commands as columns such as [prefixByte.uniq

Re: data model for unique users in a time period

2011-10-31 Thread Ed Anuff
Thanks, good point, splitting wide rows via sharding is a good optimization for the get_count approach. On Mon, Oct 31, 2011 at 10:58 AM, Zach Richardson wrote: > Ed, > > I could be completely wrong about this working--I haven't specifically > looked at how the counts are executed, but I think th

Re: data model for unique users in a time period

2011-10-31 Thread Zach Richardson
Ed, I could be completely wrong about this working--I haven't specifically looked at how the counts are executed, but I think this makes sense. You could potentially shard across several rows, based on a hash of the username combined with the time period as the row key. Run a count across each r

data model for unique users in a time period

2011-10-31 Thread Ed Anuff
I'm looking at the scenario of how to keep track of the number of unique visitors within a given time period. Inserting user ids into a wide row would allow me to have a list of every user within the time period that the row represented. My experience in the past was that using get_count on a row