People keep asking me if we finally found a solution (even if this is 3+
years old) so I will just update this thread with our findings.

We finally achieved doing this thanks to our bigdata and reporting stacks
by storing blobs corresponding to HLL (HyperLogLog) structures. HLL is an
algorithm used by Google, twitter and many more to solve count-distinct
problems. Structures built through this algorithm can be "summed" and give
a good approximation of the UV number.

Precision you will reach depends on the size of structure you chose
(predictable precision). You can reach fairly acceptable approximation with
small data structures.

So we basically store a HLL per hour and just "sum" HLL for all the hours
between 2 ranges (you can do it at day level or any other level depending
on your needs).

Hope this will help some of you, we finally had this (good) idea after more
than 3 years. Actually we use HLL for a long time but the idea of storing
HLL structures instead of counts allow us to request on custom ranges (at
the price of more intelligence on the reporting stack that must read and
smartly sum HLLs stored as blobs). We are happy with it since.

C*heers,

Alain

2012-01-19 22:21 GMT+01:00 Milind Parikh <milindpar...@gmail.com>:

> You might want to look at the code in countandra.org; regardless of
> whether you use it. It use a model of dynamic composite keys (although
> static composite keys would have worked as well). For the actual query,only
> one row is hit. This of course only works bc the data model is attuned for
> the query.
>
> Regards
> Milind
>
> /***********************
> sent from my android...please pardon occasional typos as I respond @ the
> speed of thought
> ************************/
>
> On Jan 19, 2012 1:31 AM, "Alain RODRIGUEZ" <arodr...@gmail.com> wrote:
>
> Hi thanks for your answer but I don't want to add more layer on top of
> Cassandra. I also have done all of my application without Countandra and I
> would like to continue this way.
>
> Furthermore there is a Cassandra modeling problem that I would like to
> solve, and not just hide.
>
> Alain
>
>
>
> 2012/1/18 Lucas de Souza Santos <lucas...@gmail.com>
> >
> > Why not http://www.countandra.org/
> >
> >
> > ...
>
>

Reply via email to