You may be hitting https://issues.apache.org/jira/browse/KAFKA-1382
Could you check if you have any long GCs on the server side and session timeouts from the Zookeeper log? Guozhang On Fri, Apr 11, 2014 at 3:10 PM, Seshadri, Balaji <balaji.sesha...@dish.com>wrote: > Please find more errors. > > This is on 102 with 101 shutdown: > > [2014-04-11 16:06:10.029-0600] ERROR > [Controller-2-to-broker-1-send-thread], Controller 2's connection to broker > id:1,host:tm1-kafkabroker101,port:9092 was unsuccessful > (kafka.controller.RequestSendThread) > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:465) > at sun.nio.ch.Net.connect(Net.java:457) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) > at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) > at > kafka.controller.RequestSendThread.connectToBroker(ControllerChannelManager.scala:173) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:140) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > [2014-04-11 16:06:10.329-0600] ERROR > [Controller-2-to-broker-1-send-thread], Controller 2 epoch 36 failed to > send StopReplica request with correlation id 14902 to broker > id:1,host:tm1-kafkabroker101,port:9092. Reconnecting to broker. > (kafka.controller.RequestSendThread) > java.nio.channels.ClosedChannelException > at kafka.network.BlockingChannel.send(BlockingChannel.scala:89) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > [2014-04-11 16:06:10.329-0600] ERROR > [Controller-2-to-broker-1-send-thread], Controller 2's connection to broker > id:1,host:tm1-kafkabroker101,port:9092 was unsuccessful > (kafka.controller.RequestSendThread) > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:465) > at sun.nio.ch.Net.connect(Net.java:457) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) > at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) > at > kafka.controller.RequestSendThread.connectToBroker(ControllerChannelManager.scala:173) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:140) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > > > From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] > Sent: Friday, April 11, 2014 4:00 PM > To: 'users@kafka.apache.org' > Subject: RE: Issue with Upgrade of 0.8.1 > > Thread Dump attached. > > From: Seshadri, Balaji > Sent: Friday, April 11, 2014 3:36 PM > To: 'users@kafka.apache.org' > Subject: Issue with Upgrade of 0.8.1 > > Hi, > > We upgraded to 0.8.1 version of Kafka in TEST,we did load test shutting > down 1 broker in the cluster,we are getting below error and cluster becomes > unresponsive. > > Do you guys have any fix for this issue ?. > > [2014-04-11 15:10:42.595-0600] ERROR Conditional update of path > /brokers/topics/rain-load-test/partitions/52/state with data > {"controller_epoch":33,"leader":1,"version":1,"leader_epoch":32,"isr":[1]} > and expected version 55 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /brokers/topics/rain-load-test/partitions/52/state > (kafka.utils.ZkUtils$) > [2014-04-11 15:10:42.595-0600] INFO Partition [rain-load-test,52] on > broker 1: Cached zkVersion [55] not equal to that in zookeeper, skip > updating ISR (kafka.cluster.Partition) > > > Thanks, > > Balaji > > > > > -- -- Guozhang