Hi,

First I'd like to thank kafka developers for writing kafka.

This is an announcement for the first release of a file system logging
agent based on kafka.

It is written for collecting logs from servers running all kind of software,
as a generic way to collect logs without needing to know about each logger.

Home:
https://github.com/yazgoo/fuse_kafka

Here are some functionnalities:

   - sends all writes to given directories to kafka
   - passes through FS syscalls to underlying directory
   - captures the pid, gid, uid, user, group, command line doing the write
   - you can add metadata to identify from where the message comes from
   (e.g. ip-address, ...)
   - you can configure kafka destination cluster either by giving a broker
   list or a zookeeper list
   - you can specify a bandwidth quota: fuse_kafka won't send data if a
   file is written more than a given size per second (useful for preventing
   floods caused by core files dumped or log rotations in directories watched
   by fuse_kafka)

It is based on:

   - FUSE (filesystem in userspace), to capture writes done under a given
   directory
   - kafka (messaging queue), as the event transport system
   - logstash: events are written to kafka in logstash format (except
   messages and commands which are stored in base64)

It is written in C and python.

Packages are provided for various distros, see installing section in
README.md.
FUSE adds an overhead, so it should not be used on filesystems where high
throughput is necessary.
Here are benchmarks:
http://htmlpreview.github.io/?https://raw.githubusercontent.com/yazgoo/fuse_kafka/master/benchs/benchmarks.html

Contributions are welcome, of course!

Regards

Reply via email to