I'm wondering how to modelize my CFs to store the number of unique visitors in a time period in order to be able to request it fast.
I thought of sharding them by day (row = 20120118, column = visitor_id, value = '') and perform a getcount. This would work to get unique visitors per day, per week or per month but it wouldn't work if I want to get unique visitors between 2 specific dates because 2 rows can share the same visitors (same columns). I can have 1500 unique visitors today, 1000 unique visitors yesterday but only 2000 new visitors when aggregating these days. I could get all the columns for this 2 rows and perform an intersect with my client language but performance won't be good with big data. Has someone already thought about this modelization ? Thanks for your help ;) Alain