[
https://issues.apache.org/jira/browse/KAFKA-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320671#comment-17320671
]
GeoffreyStark commented on KAFKA-8714:
--------------------------------------
maybe the same issue I created
https://issues.apache.org/jira/browse/KAFKA-12665[https://issues.apache.org/jira/browse/KAFKA-12665]
> CLOSE_WAIT connections piling up on the broker
> ----------------------------------------------
>
> Key: KAFKA-8714
> URL: https://issues.apache.org/jira/browse/KAFKA-8714
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.10.1.0, 2.3.0
> Environment: Linux 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24
> 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Rajdeep Mukherjee
> Priority: Major
> Attachments: Screenshot from 2019-07-25 11-53-24.png,
> consumer_multiprocessing.py, producer_multiprocessing.py
>
>
> We are experiencing an issue where `CLOSE_WAIT` connections are piling up in
> the brokers leading to a `Too many open files` error finally leading to a
> crash of the corresponding broker. After some digging, we realized that this
> is happening at instances when multiple clients(producers or consumers) are
> closing their connections within a brief interval of time(when the frequency
> of client connection closes is increasing).
> The actual error that we had encountered was:
> {code:java}
> [2019-07-18 00:03:27,861] ERROR Error while accepting connection
> (kafka.network.Acceptor) java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at kafka.network.Acceptor.accept(SocketServer.scala:326)
> at kafka.network.Acceptor.run(SocketServer.scala:269)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> When the error was encountered, the number of CLOSE_WAIT connections on the
> broker was 200,000 and the number of ESTABLISHED connections was
> approximately 15000.
> The attachment shows the issue, the sharp dip in the graph is the point where
> the broker restarted.
> We had encountered this problem in both kafka version 0.10.1 and 2.3.0
> The client version we were using for reproducing was:
>
> {code:java}
> confluent-kafka==1.1.0
> librdkafka v1.1.0
> {code}
>
> Steps to reproduce:
> I have attached the scripts we used for reproducing the issue.
> In our qa environment we were successfully able to reproduce the issue in the
> following way:
> * we spun a 5 node kafka v2.3.0 cluster
> * we had prepared a python script that would spin in the order of 500+
> producer processes and the same number of consumer processes and we had
> written in logic to randomly close the producer and consumer connections at a
> high frequency in the order of 10 closes per second for 5 minutes.
> * On the broker side, we were watching for CLOSE_WAIT connections using
> `lsof` and we got sustained CLOSE_WAIT connections that persisted until we
> restarted kafka on the corresponding broker.
> The command to be run for the producer and consumer scripts are:
> {code:java}
> python producer_multiprocessing.py <time in seconds> <number of processes
> <sleep in seconds between produce> true true
> python consumer_multiprocessing.py <time in seconds> <number of processes> 0
> true
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)