We use Kafka as Transport Layer to transport application logs. How do we monitor Producers at large scales about 6000 boxes x 4 topic per box so roughly 24000 producers (spread across multiple data center.. we have brokers per DC). We do the monitoring based on logs. I have tried intercepting logs via Log4J custom implementation which only intercept WARN and ERROR and FATAL events org.apache.log4j.AppenderSkeleton append method which send its logs to brokers (This is working but after load testing it is causing deadlock some times between ProducerSendThread and Producer).
I know there are JMX monitoring MBeans available which we can pull the data, but I would like to monitor Exceptions eg Leader Not Found, Queue is full, resend fail etc in Kafka Library. How does LinkedIn monitor the Producers ? Thanks, Bhavesh