> How to do it ? Do I need to build a custom plugin/sink or can I configure an 
> existing sink to write data in a custom way ?
This is a good starting point https://github.com/thobbs/flume-cassandra-plugin

> 2 - My business process also use my Cassandra DB (without flume, directly via 
> thrift), how to ensure that log writing won't overload my database and 
> introduce latency in my business process ?
Anytime you have a data stream you don't control it's a good idea to put some 
sort of buffer in there between the outside world and the database. Flume has a 
buffered sync, I think your can subclass it and aggregate the counters for a 
minute or two 
http://archive.cloudera.com/cdh/3/flume/UserGuide/#_buffered_sink_and_decorator_semantics

Hope that helps. 
A
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/02/2012, at 4:27 AM, Alain RODRIGUEZ wrote:

> Hi,
> 
> 1 - I would like to generate some statistics and store some raw events from 
> log files tailed with flume. I saw some plugins giving Cassandra sinks but I 
> would like to store data in a custom way, storing raw data but also 
> incrementing counters to get near real-time statistcis. How to do it ? Do I 
> need to build a custom plugin/sink or can I configure an existing sink to 
> write data in a custom way ?
> 
> 2 - My business process also use my Cassandra DB (without flume, directly via 
> thrift), how to ensure that log writing won't overload my database and 
> introduce latency in my business process ? I mean, is there a way to to 
> manage the throughput sent by the flume's tails and slow them when my 
> Cassandra cluster is overloaded ? I would like to avoid building 2 separated 
> clusters.
> 
> Thank you,
> 
> Alain
> 

Reply via email to