On Thu, Jan 19, 2012 at 8:25 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

>
> I'm still in the dark about how to get the number of unique visitors
> between 2 dates (randomly chosen, because chosen by user) efficiently.
>
> I could easily count them per hour, day, week, month... But it's a bit
> harder to give this statistic between 2 unknown dates as explained at the
> start of this thread.
>
> Am I missing any clue in these slides ?
>

Sometimes you will be fetching slices of multiple rows.

Basically, here's the procedure, given a start time t1 and and end time t2:
1. Determine all buckets (row keys) that hold data between t1 and t2.
Usually this means finding the bucket that t1 falls in, the bucket that t2
falls in, and then all buckets inbetween.
2. Use t1 as the column slice start, t2 as the column slice end, and
multiget all of the buckets that you just calculated.
3. Merge the results by concatenating the rows in order.

Note that the only rows where you will end up getting a partial slice are
the first and last row.  For all of the rows inbetween, you will end up
fetching the entire row.  This is fine, because t1 will be less than all of
the columns in those rows, and t2 will be greater than all of the columns
in those rows.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Reply via email to