Re: rainbird question (why is the 1minute buffer needed?)

aaron morton Sun, 22 May 2011 03:47:59 -0700

 The implementation of distributed counters is  more complicated than your 
example, there is a design doc attached to the ticket here 
https://issues.apache.org/jira/browse/CASSANDRA-1072


By collapsing some of those +1 increments together at the application level 
there is less work for the cluster to do. This can be important when the 
numbers are big http://blog.twitter.com/2011/03/numbers.html

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 May 2011, at 09:04, Yang wrote:

> (sorry if Rainbird is not a topic relevant enough, I'd appreciate if
> someone could point me to a more appropriate venue in that case)
> 
> 
> Rainbird buffers up 1 minute worth of events first before writing to 
> Cassandra.
> 
> it seems that this extra layer of buffering is repetitive, and could
> be avoided : Cassandra's memtable already does buffering, whose
> internal implementation is just
> Map.put(key, CF ) , I guess rainbird does similar things :
> column_to_count = map.get(key); column_to_count++ ; map.put(key,
> column_to_count) ??
> the "++" part is probably already done by the Distributed Counters in
> Cassandra.
> then I guess Rainbird layer exists because it needs to parse an
> incoming event into various attributes that it is interested in: for
> example from an url, we bump up the counts of
> FQDN , domain, path etc, Rainbird does the transformation from
> url--->3 attrs.
> 
> but I guess that transformation might as well be done in the cassandra
> JVM itself, if we could provide some hooks, so that a module
> translates incoming request into
> multiple keys, and bump up their counts. that way we avoid the
> intermediate communication from clients to rainbird,  and rainbird to
> Cassandra. are there some points I'm missing?
> 
> Thanks
> Yang

Re: rainbird question (why is the 1minute buffer needed?)

Reply via email to