[ https://issues.apache.org/jira/browse/KAFKA-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481890#comment-14481890 ]
Jun Rao commented on KAFKA-2096: -------------------------------- [~allenxwang], that seems to be a good fix. Do you want to submit a patch? > Enable keepalive socket option for broker to prevent socket leak > ---------------------------------------------------------------- > > Key: KAFKA-2096 > URL: https://issues.apache.org/jira/browse/KAFKA-2096 > Project: Kafka > Issue Type: Improvement > Components: network > Affects Versions: 0.8.2.1 > Reporter: Allen Wang > Assignee: Jun Rao > Priority: Critical > > We run a Kafka 0.8.2.1 cluster in AWS with large number of producers (> > 10000). Also the number of producer instances scale up and down significantly > on a daily basis. > The issue we found is that after 10 days, the open file descriptor count will > approach the limit of 32K. An investigation of these open file descriptors > shows that a significant portion of these are from client instances that are > terminated during scaling down. Somehow they still show as "ESTABLISHED" in > netstat. We suspect that the AWS firewall between the client and broker > causes this issue. > We attempted to use "keepalive" socket option to reduce this socket leak on > broker and it appears to be working. Specifically, we added this line to > kafka.network.Acceptor.accept(): > socketChannel.socket().setKeepAlive(true) > It is confirmed during our experiment of this change that entries in netstat > where the client instance is terminated were probed as configured in > operating system. After configured number of probes, the OS determined that > the peer is no longer alive and the entry is removed, possibly after an error > in Kafka to read from the channel and closing the channel. Also, our > experiment shows that after a few days, the instance was able to keep a > stable low point of open file descriptor count, compared with other instances > where the low point keeps increasing day to day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)