Joseph Aliase created KAFKA-5007:
------------------------------------

             Summary: Kafka Replica Fetcher Thread- Resource Leak
                 Key: KAFKA-5007
                 URL: https://issues.apache.org/jira/browse/KAFKA-5007
             Project: Kafka
          Issue Type: Bug
          Components: core, network
    Affects Versions: 0.10.1.1
         Environment: Centos 7
Jave 8
            Reporter: Joseph Aliase


Kafka is running out of open file descriptor when system network interface is 
done.

Issue description:
We have a Kafka Cluster of 5 node running on version 0.10.1.1. The open file 
descriptor for the account running Kafka is set to 100000.

During an upgrade, network interface went down. Outage continued for 12 hours 
eventually all the broker crashed with java.io.IOException: Too many open files 
error.

We repeated the test in a lower environment and observed that Open Socket count 
keeps on increasing while the NIC is down.
We have around 13 topics with max partition size of 120 and number of replica 
fetcher thread is set to 8.

Using an internal monitoring tool we observed that Open Socket descriptor   for 
the broker pid continued to increase although NIC was down leading to  Open 
File descriptor error. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to