I have a web application that generates multiple log files in a log file directory. On a particularly chatty box, up to 2000 entries per second are written to those log files. We are looking for a solution to tail that directory and insert new entries into a cassandra db.
The fields in the log file are pipe delimited, but we can delimit the data points using any delimiter. We would want to structure the data such that each data point would get its own column when its inserted into Cassandra. We setup Flume to handle this, but the cassandra sink isn't robust enough to handle even one chatty machine. We may have up to 200 machines. Any suggestions on a tool that can reliably do this. Data not making it into the cassandra db will cause huge problems, so that is a factor to consider. Regards, Trevor Francis