The implementation of distributed counters is more complicated than your example, there is a design doc attached to the ticket here https://issues.apache.org/jira/browse/CASSANDRA-1072
By collapsing some of those +1 increments together at the application level there is less work for the cluster to do. This can be important when the numbers are big http://blog.twitter.com/2011/03/numbers.html Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 May 2011, at 09:04, Yang wrote: > (sorry if Rainbird is not a topic relevant enough, I'd appreciate if > someone could point me to a more appropriate venue in that case) > > > Rainbird buffers up 1 minute worth of events first before writing to > Cassandra. > > it seems that this extra layer of buffering is repetitive, and could > be avoided : Cassandra's memtable already does buffering, whose > internal implementation is just > Map.put(key, CF ) , I guess rainbird does similar things : > column_to_count = map.get(key); column_to_count++ ; map.put(key, > column_to_count) ?? > the "++" part is probably already done by the Distributed Counters in > Cassandra. > then I guess Rainbird layer exists because it needs to parse an > incoming event into various attributes that it is interested in: for > example from an url, we bump up the counts of > FQDN , domain, path etc, Rainbird does the transformation from > url--->3 attrs. > > but I guess that transformation might as well be done in the cassandra > JVM itself, if we could provide some hooks, so that a module > translates incoming request into > multiple keys, and bump up their counts. that way we avoid the > intermediate communication from clients to rainbird, and rainbird to > Cassandra. are there some points I'm missing? > > Thanks > Yang