Re: User click count

2014-12-31 Thread Ajay
Thanks Eric. Happy new year 2015 for all Cassandra developers and Users :). This group seems the most active of apache big data projects. Will come back with more questions :) Thanks Ajay On Dec 31, 2014 8:02 PM, "Eric Stevens" wrote: > You can totally avoid the impact of tombstones by rotatin

Re: User click count

2014-12-31 Thread Eric Stevens
You can totally avoid the impact of tombstones by rotating your partition key in the exact counts table, and only deleting whole partitions once you've counted them. Once you've counted them you never have cause to read that partition key again. You can totally store the final counts in Cassandra

Re: User click count

2014-12-30 Thread Ajay
Thanks Janne and Rob. The idea is like this : To store the User clicks on Cassandra and a scheduler to count/aggregate the clicks per link or ad hourly/daily/monthly and store in My SQL (or may be in Cassandra itself). Since tombstones will be deleted only after some days (as per configuration),

Re: User click count

2014-12-30 Thread Robert Coli
On Mon, Dec 29, 2014 at 6:05 AM, Ajay wrote: > In my case, Cassandra is the only storage. If the counters get incorrect, > it could't be corrected. > Cassandra counters are not appropriate for this use case, if correctness is a requirement. =Rob

Re: User click count

2014-12-30 Thread Janne Jalkanen
Hi! Yes, since all the writes for a partition (or row if you speak Thrift) always go to the same replicas, you will need to design to avoid hotspots - a pure day row will cause all the writes for a single day to go to the same replicas, so those nodes will have to work really hard for a day, a

Re: User click count

2014-12-29 Thread Ajay
Thanks Janne, Alain and Eric. Now say I go with counters (hourly, daily, monthly) and also store UUID as below: user Id : /mm/dd as row key and dynamic columns for each click with column key as timestamp and value as empty. Periodically count the columns and rows and correct the counters. Now

Re: User click count

2014-12-29 Thread Eric Stevens
> If the counters get incorrect, it could't be corrected You'd have to store something that allowed you to correct it. For example, the TimeUUID approach to keep true counts, which are slow to read but accurate, and a background process that trues up your counter columns periodically. On Mon, De

Re: User click count

2014-12-29 Thread Ajay
Thanks for the clarification. In my case, Cassandra is the only storage. If the counters get incorrect, it could't be corrected. For that if we store raw data, we can as well go that approach. But the granularity has to be as seconds level as more than one user can click the same link. So the data

Re: User click count

2014-12-29 Thread Alain RODRIGUEZ
Hi Ajay, Here is a good explanation you might want to read. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters Though we use counters for 3 years now, we used them from start C* 0.8 and we are happy with them. Limits I can see in both ways are: Count

Re: User click count

2014-12-29 Thread Ajay
Hi, So you mean to say counters are not accurate? (It is highly likely that multiple parallel threads trying to increment the counter as users click the links). Thanks Ajay On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen wrote: > > Hi! > > It’s really a tradeoff between accurate and fast and

Re: User click count

2014-12-29 Thread Janne Jalkanen
Hi! It’s really a tradeoff between accurate and fast and your read access patterns; if you need it to be fairly fast, use counters by all means, but accept the fact that they will (especially in older versions of cassandra or adverse network conditions) drift off from the true click count. If