What does the output of: lsof -p <broker-pid>
show? -Jaikiran On Monday 12 September 2016 10:03 PM, Michael Sparr wrote:
5-node Kafka cluster, bare metal, Ubuntu 14.04.x LTS with 64GB RAM, 8-core, 960GB SSD boxes and a single node in cluster is filling logs with the following: [2016-09-12 09:34:49,522] ERROR Error while accepting connection (kafka.network.Acceptor) java.io.IOException: Too many open files at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) at kafka.network.Acceptor.accept(SocketServer.scala:323) at kafka.network.Acceptor.run(SocketServer.scala:268) at java.lang.Thread.run(Thread.java:745) No other nodes in cluster have this issue. Separate application server has consumers/producers using librdkafka + confluent kafka python library with a few million messages published to under 100 topics. For days now the /var/log/kafka/kafka.server.log.N are filling up server with this message and using up all space on only a single server node in cluster. I have soft/hard limits at 65,535 for all users so > ulimit -n reveals 65535 Is there a setting I should add from librdkafka config in the Python producer clients to shorten socket connections even further to avoid this or something else going on? Should I write this as issue in Github repo and if so, which project? Thanks!