Furher investigations:

I have compared open files/connections of the different nodes. Same count in 
real open files (data dir files) and established connections on all nodes.

But the affected node has a lot of "CLOSE_WAIT" connections (many thousends) to 
IPs of external clients (no specific ip). The other nodes less than 10.


    Hi,
    
    I’m running a Kafka cluster with many topics and constant input of data.
    The cluster is running for over one year but now (since 2 weeks) there is 
one node where I see a steady increase of open file descriptors of the Kafka 
server process.
    All other nodes have a constant number of this metric. Topics/partitions 
are distributed equal over all nodes, same hardware.
    
    The open file limit was reached last week and the node worked normally 
after restart and recovery…but since the restart the file descriptors are 
increasing again..
    
    Any idea or things to do to find out more?
    
    Version: 0.10.2.1
    
    Thanks,
    Michael
    

Reply via email to