I have a web application that generates multiple log files in a log file 
directory. On a particularly chatty box, up to 2000 entries per second are 
written to those log files. We are looking for a solution to tail that 
directory and insert new entries into a cassandra db. 

The fields in the log file are pipe delimited, but we can delimit the data 
points using any delimiter. We would want to structure the data such that each 
data point would get its own column when its inserted into Cassandra. 

We setup Flume to handle this, but the cassandra sink isn't robust enough to 
handle even one chatty machine. We may have up to 200 machines.

Any suggestions on a tool that can reliably do this. Data not making it into 
the cassandra db will cause huge problems, so that is a factor to consider.

Regards,

Trevor Francis


Reply via email to