> How to do it ? Do I need to build a custom plugin/sink or can I configure an > existing sink to write data in a custom way ? This is a good starting point https://github.com/thobbs/flume-cassandra-plugin
> 2 - My business process also use my Cassandra DB (without flume, directly via > thrift), how to ensure that log writing won't overload my database and > introduce latency in my business process ? Anytime you have a data stream you don't control it's a good idea to put some sort of buffer in there between the outside world and the database. Flume has a buffered sync, I think your can subclass it and aggregate the counters for a minute or two http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_and_decorator_semantics Hope that helps. A ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote: > Hi, > > 1 - I would like to generate some statistics and store some raw events from > log files tailed with flume. I saw some plugins giving Cassandra sinks but I > would like to store data in a custom way, storing raw data but also > incrementing counters to get near real-time statistcis. How to do it ? Do I > need to build a custom plugin/sink or can I configure an existing sink to > write data in a custom way ? > > 2 - My business process also use my Cassandra DB (without flume, directly via > thrift), how to ensure that log writing won't overload my database and > introduce latency in my business process ? I mean, is there a way to to > manage the throughput sent by the flume's tails and slow them when my > Cassandra cluster is overloaded ? I would like to avoid building 2 separated > clusters. > > Thank you, > > Alain >