Ed,

I could be completely wrong about this working--I haven't specifically
looked at how the counts are executed, but I think this makes sense.

You could potentially shard across several rows, based on a hash of
the username combined with the time period as the row key.  Run a
count across each row and then add them up.  If your cluster is large
enough this could spread the computation enough to make each query for
the count a bit faster.

Depending on how often this query would be hit, I would still
recommend caching, but you could calculate reality a little more
often.

Zach


On Mon, Oct 31, 2011 at 12:22 PM, Ed Anuff <e...@anuff.com> wrote:
> I'm looking at the scenario of how to keep track of the number of
> unique visitors within a given time period.  Inserting user ids into a
> wide row would allow me to have a list of every user within the time
> period that the row represented.  My experience in the past was that
> using get_count on a row to get the column count got slow pretty quick
> but that might still be the easiest way to get the count of unique
> users with some sort of caching of the count so that it's not
> expensive subsequently.  Using Hadoop is overkill for this scenario.
> Any other approaches?
>
> Ed
>

Reply via email to