(sorry if Rainbird is not a topic relevant enough, I'd appreciate if someone could point me to a more appropriate venue in that case)
Rainbird buffers up 1 minute worth of events first before writing to Cassandra. it seems that this extra layer of buffering is repetitive, and could be avoided : Cassandra's memtable already does buffering, whose internal implementation is just Map.put(key, CF ) , I guess rainbird does similar things : column_to_count = map.get(key); column_to_count++ ; map.put(key, column_to_count) ?? the "++" part is probably already done by the Distributed Counters in Cassandra. then I guess Rainbird layer exists because it needs to parse an incoming event into various attributes that it is interested in: for example from an url, we bump up the counts of FQDN , domain, path etc, Rainbird does the transformation from url--->3 attrs. but I guess that transformation might as well be done in the cassandra JVM itself, if we could provide some hooks, so that a module translates incoming request into multiple keys, and bump up their counts. that way we avoid the intermediate communication from clients to rainbird, and rainbird to Cassandra. are there some points I'm missing? Thanks Yang