[
https://issues.apache.org/jira/browse/KAFKA-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148669#comment-16148669
]
Jungbae Jun commented on KAFKA-5778:
------------------------------------
I had experienced for 3 times in a month, same version and symptoms
In the Last occurence, hanged broker was removed after 50 mins from the repilca
automatically. (the kafka was unavailable for 50 mins)
But I couldn't find any Error messsage in the log files.
> Kafka cluster is not responding when one broker hangs and resulted in too
> many connections in close_wait in other brokers
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-5778
> URL: https://issues.apache.org/jira/browse/KAFKA-5778
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.10.0.1
> Reporter: saichand
> Priority: Blocker
>
> In a cluster of 3 brokers , one of the broker(Broker-1 ) is hanged and from
> then other two brokers has connections in close_wait for java client
> producer/consumer and also even some broker to broker connections are in
> close wait among those two brokers.
> Kafka Version : 0.10.0.1
> In logs I found replica fetcher thread connection refused exceptions:
> In broker 0 : replica fetcher 0-1, replica fetcher 0-2
> In broker 2 : replica fetcher 0-0, replica fetcher 0-1
> In broker 1 : It was hung no logs were available at that time.
> We tried restarting broker- 2 kafka and then it was not successful as it
> terminated saying zookeeper timeout
> then we tried restarting broker- 0 kafka and we got the same error
> Broker -1 was hang so , we could not login even into it
> so we restarted broker -1 machine
> then we restarted all zookepers and then kafka brokers now everything is fine
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)