Our 0.7.2 Kafka cluster keeps crashing with: 2013-09-24 17:21:47,513 - [kafka-acceptor:Acceptor@153] - Error in acceptor java.io.IOException: Too many open
The obvious fix is to bump up the number of open files but I'm wondering if there is a leak on the Kafka side and/or our application side. We currently have the ulimit set to a generous 4096 but obviously we are hitting this ceiling. What's a recommended value? We are running rails and our Unicorn workers are connecting to our Kafka cluster via round-robin load balancing. We have about 1500 workers to that would be 1500 connections right there but they should be split across our 3 nodes. Instead Netstat shows thousands of connections that look like this: tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:10.99.99.1:22503 ESTABLISHED tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:10.99.99.1:48398 ESTABLISHED tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:10.99.99.2:29617 ESTABLISHED tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:10.99.99.1:32444 ESTABLISHED tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:10.99.99.1:34415 ESTABLISHED tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:10.99.99.1:56901 ESTABLISHED tcp 0 0 kafka1.mycompany.:XmlIpcRegSvc ::ffff:10.99.99.2:45349 ESTABLISHED Has anyone come across this problem before? Is this a 0.7.2 leak, LB misconfiguration… ? Thanks