Hi Background :
I have following set up Apache server >> Apache Kafka Producer >> Apache Kafka Cluster >> Apache Storm As a normal scenario, front end boxes run the apache server and populate the log files. The requirement is to read every log and send it to kafka cluster. The java producer reads the logs from stdin and transfer to cluster. Zero loss criteria definition : contents of error log files should match the data received by Kafka cluster per hour eventually per day. The error_log files get rotated per hour. There are couple of ways already tried to connect log files and the producer 1. Custom startup script to start, stop and check status of the server : tail -n0 -F /var/log/httpd/error_log /var/log/httpd/ssl_error_log | java consumer 2. Hooking up directly to apache using httpd.conf setting : ErrorLog "| /usr/bin/tee -a /var/log/httpd/error_log | java consumer" In case 1 loss of logs was observed but that reduced significantly in case 2, where apache restarts the process the data is piped to if it crashes and restarts it along with server restart as well. Now the loss is seen across the restart of apache server. Questions : 1. what is appropriate way to interface apache httpd and kafka ? 2. Is there way to gracefully shut down the kafka producer so that the pending buffers are flushed before the process dies ?