Hi, First I'd like to thank kafka developers for writing kafka.
This is an announcement for the first release of a file system logging agent based on kafka. It is written for collecting logs from servers running all kind of software, as a generic way to collect logs without needing to know about each logger. Home: https://github.com/yazgoo/fuse_kafka Here are some functionnalities: - sends all writes to given directories to kafka - passes through FS syscalls to underlying directory - captures the pid, gid, uid, user, group, command line doing the write - you can add metadata to identify from where the message comes from (e.g. ip-address, ...) - you can configure kafka destination cluster either by giving a broker list or a zookeeper list - you can specify a bandwidth quota: fuse_kafka won't send data if a file is written more than a given size per second (useful for preventing floods caused by core files dumped or log rotations in directories watched by fuse_kafka) It is based on: - FUSE (filesystem in userspace), to capture writes done under a given directory - kafka (messaging queue), as the event transport system - logstash: events are written to kafka in logstash format (except messages and commands which are stored in base64) It is written in C and python. Packages are provided for various distros, see installing section in README.md. FUSE adds an overhead, so it should not be used on filesystems where high throughput is necessary. Here are benchmarks: http://htmlpreview.github.io/?https://raw.githubusercontent.com/yazgoo/fuse_kafka/master/benchs/benchmarks.html Contributions are welcome, of course! Regards