Furher investigations: I have compared open files/connections of the different nodes. Same count in real open files (data dir files) and established connections on all nodes.
But the affected node has a lot of "CLOSE_WAIT" connections (many thousends) to IPs of external clients (no specific ip). The other nodes less than 10. Hi, I’m running a Kafka cluster with many topics and constant input of data. The cluster is running for over one year but now (since 2 weeks) there is one node where I see a steady increase of open file descriptors of the Kafka server process. All other nodes have a constant number of this metric. Topics/partitions are distributed equal over all nodes, same hardware. The open file limit was reached last week and the node worked normally after restart and recovery…but since the restart the file descriptors are increasing again.. Any idea or things to do to find out more? Version: 0.10.2.1 Thanks, Michael