Thanks Eric.
Happy new year 2015 for all Cassandra developers and Users :). This group
seems the most active of apache big data projects.
Will come back with more questions :)
Thanks
Ajay
On Dec 31, 2014 8:02 PM, "Eric Stevens" wrote:
> You can totally avoid the impact of tombstones by rotatin
You can totally avoid the impact of tombstones by rotating your partition
key in the exact counts table, and only deleting whole partitions once
you've counted them. Once you've counted them you never have cause to read
that partition key again.
You can totally store the final counts in Cassandra
Thanks Janne and Rob.
The idea is like this : To store the User clicks on Cassandra and a
scheduler to count/aggregate the clicks per link or ad
hourly/daily/monthly and store in My SQL (or may be in Cassandra itself).
Since tombstones will be deleted only after some days (as per
configuration),
On Mon, Dec 29, 2014 at 6:05 AM, Ajay wrote:
> In my case, Cassandra is the only storage. If the counters get incorrect,
> it could't be corrected.
>
Cassandra counters are not appropriate for this use case, if correctness is
a requirement.
=Rob
Hi!
Yes, since all the writes for a partition (or row if you speak Thrift) always
go to the same replicas, you will need to design to avoid hotspots - a pure day
row will cause all the writes for a single day to go to the same replicas, so
those nodes will have to work really hard for a day, a
Thanks Janne, Alain and Eric.
Now say I go with counters (hourly, daily, monthly) and also store UUID as
below:
user Id : /mm/dd as row key and dynamic columns for each click with
column key as timestamp and value as empty. Periodically count the columns
and rows and correct the counters. Now
> If the counters get incorrect, it could't be corrected
You'd have to store something that allowed you to correct it. For example,
the TimeUUID approach to keep true counts, which are slow to read but
accurate, and a background process that trues up your counter columns
periodically.
On Mon, De
Thanks for the clarification.
In my case, Cassandra is the only storage. If the counters get incorrect,
it could't be corrected. For that if we store raw data, we can as well go
that approach. But the granularity has to be as seconds level as more than
one user can click the same link. So the data
Hi Ajay,
Here is a good explanation you might want to read.
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
Though we use counters for 3 years now, we used them from start C* 0.8 and
we are happy with them. Limits I can see in both ways are:
Count
Hi,
So you mean to say counters are not accurate? (It is highly likely that
multiple parallel threads trying to increment the counter as users click
the links).
Thanks
Ajay
On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen
wrote:
>
> Hi!
>
> It’s really a tradeoff between accurate and fast and
Hi!
It’s really a tradeoff between accurate and fast and your read access patterns;
if you need it to be fairly fast, use counters by all means, but accept the
fact that they will (especially in older versions of cassandra or adverse
network conditions) drift off from the true click count. If
11 matches
Mail list logo