We has high GC in one of the brokers when other one was down. Please find zookeeper timeout logs.We are going to do more testing as we think too much logging happening on Kafka side causing high gc(Kafka logs where in DEBUG mode).
2014-04-11 15:46:02 WARN server.ZooKeeperServer - Connection request from old client /10.18.60.25:50027; will be dropped if server is in r-o mode 2014-04-11 15:46:02 INFO server.ZooKeeperServer - Client attempting to establish new session at /10.18.60.25:50027 2014-04-11 15:46:02 INFO server.ZooKeeperServer - Established session 0x2453d08efca0561 with negotiated timeout 30000 for client /10.18.60.25:50027 2014-04-11 15:46:03 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x2453d08efca0561 2014-04-11 15:46:05 WARN server.ZooKeeperServer - Connection request from old client /10.18.60.25:50030; will be dropped if server is in r-o mode 2014-04-11 15:46:05 INFO server.ZooKeeperServer - Client attempting to establish new session at /10.18.60.25:50030 2014-04-11 15:46:05 INFO server.ZooKeeperServer - Established session 0x2453d08efca0562 with negotiated timeout 30000 for client /10.18.60.25:50030 2014-04-11 15:46:06 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x2453d08efca0562 2014-04-11 15:46:09 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x1453d08efa20562 2014-04-11 15:46:10 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x1453d08efa20563 2014-04-11 15:46:12 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x1453d08efa20564 2014-04-11 15:46:13 WARN server.ZooKeeperServer - Connection request from old client /10.18.60.25:50045; will be dropped if server is in r-o mode 2014-04-11 15:46:13 INFO server.ZooKeeperServer - Client attempting to establish new session at /10.18.60.25:50045 2014-04-11 15:46:13 INFO server.ZooKeeperServer - Established session 0x2453d08efca0563 with negotiated timeout 30000 for client /10.18.60.25:50045 2014-04-11 15:46:13 WARN server.ZooKeeperServer - Connection request from old client /10.18.60.25:50046; will be dropped if server is in r-o mode 2014-04-11 15:46:13 INFO server.ZooKeeperServer - Client attempting to establish new session at /10.18.60.25:50046 2014-04-11 15:46:13 INFO server.ZooKeeperServer - Established session 0x2453d08efca0564 with negotiated timeout 30000 for client /10.18.60.25:50046 2014-04-11 15:46:15 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x2453d08efca0563 2014-04-11 15:46:15 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x2453d08efca0564 2014-04-11 15:46:18 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x1453d08efa20565 2014-04-11 15:46:21 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x1453d08efa20566 2014-04-11 15:46:22 WARN server.ZooKeeperServer - Connection request from old client /10.18.60.25:50056; will be dropped if server is in r-o mode 2014-04-11 15:46:22 INFO server.ZooKeeperServer - Client attempting to establish new session at /10.18.60.25:50056 2014-04-11 15:46:22 INFO server.ZooKeeperServer - Established session 0x2453d08efca0565 with negotiated timeout 30000 for client /10.18.60.25:50056 2014-04-11 15:46:24 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x2453d08efca0565 2014-04-11 15:46:25 WARN server.ZooKeeperServer - Connection request from old client /10.18.60.25:50059; will be dropped if server is in r-o mode 2014-04-11 15:46:25 INFO server.ZooKeeperServer - Client attempting to establish new session at /10.18.60.25:50059 2014-04-11 15:46:25 INFO server.ZooKeeperServer - Established session 0x2453d08efca0566 with negotiated timeout 30000 for client /10.18.60.25:50059 2014-04-11 15:46:27 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x2453d08efca0566 2014-04-11 15:46:28 WARN server.ZooKeeperServer - Connection request from old client /10.18.60.25:50062; will be dropped if server is in r-o mode 2014-04-11 15:46:28 INFO server.ZooKeeperServer - Client attempting to establish new session at /10.18.60.25:50062 2014-04-11 15:46:28 INFO server.ZooKeeperServer - Established session 0x2453d08efca0567 with negotiated timeout 30000 for client /10.18.60.25:50062 2014-04-11 15:46:29 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x2453d08efca0567 2014-04-11 15:46:32 INFO server.PrepRequestProcessor - Processed session termination for sessionid: 0x1453d08efa20568 -----Original Message----- From: Guozhang Wang [mailto:wangg...@gmail.com] Sent: Friday, April 11, 2014 7:27 PM To: users@kafka.apache.org Subject: Re: Issue with Upgrade of 0.8.1 You may be hitting https://issues.apache.org/jira/browse/KAFKA-1382 Could you check if you have any long GCs on the server side and session timeouts from the Zookeeper log? Guozhang On Fri, Apr 11, 2014 at 3:10 PM, Seshadri, Balaji <balaji.sesha...@dish.com>wrote: > Please find more errors. > > This is on 102 with 101 shutdown: > > [2014-04-11 16:06:10.029-0600] ERROR > [Controller-2-to-broker-1-send-thread], Controller 2's connection to > broker > id:1,host:tm1-kafkabroker101,port:9092 was unsuccessful > (kafka.controller.RequestSendThread) > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:465) > at sun.nio.ch.Net.connect(Net.java:457) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) > at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) > at > kafka.controller.RequestSendThread.connectToBroker(ControllerChannelManager.scala:173) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:140) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > [2014-04-11 16:06:10.329-0600] ERROR > [Controller-2-to-broker-1-send-thread], Controller 2 epoch 36 failed > to send StopReplica request with correlation id 14902 to broker > id:1,host:tm1-kafkabroker101,port:9092. Reconnecting to broker. > (kafka.controller.RequestSendThread) > java.nio.channels.ClosedChannelException > at kafka.network.BlockingChannel.send(BlockingChannel.scala:89) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:132) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > [2014-04-11 16:06:10.329-0600] ERROR > [Controller-2-to-broker-1-send-thread], Controller 2's connection to > broker > id:1,host:tm1-kafkabroker101,port:9092 was unsuccessful > (kafka.controller.RequestSendThread) > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:465) > at sun.nio.ch.Net.connect(Net.java:457) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670) > at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57) > at > kafka.controller.RequestSendThread.connectToBroker(ControllerChannelManager.scala:173) > at > kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:140) > at > kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:131) > at > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > > > From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com] > Sent: Friday, April 11, 2014 4:00 PM > To: 'users@kafka.apache.org' > Subject: RE: Issue with Upgrade of 0.8.1 > > Thread Dump attached. > > From: Seshadri, Balaji > Sent: Friday, April 11, 2014 3:36 PM > To: 'users@kafka.apache.org' > Subject: Issue with Upgrade of 0.8.1 > > Hi, > > We upgraded to 0.8.1 version of Kafka in TEST,we did load test > shutting down 1 broker in the cluster,we are getting below error and > cluster becomes unresponsive. > > Do you guys have any fix for this issue ?. > > [2014-04-11 15:10:42.595-0600] ERROR Conditional update of path > /brokers/topics/rain-load-test/partitions/52/state with data > {"controller_epoch":33,"leader":1,"version":1,"leader_epoch":32,"isr": > [1]} > and expected version 55 failed due to > org.apache.zookeeper.KeeperException$BadVersionException: > KeeperErrorCode = BadVersion for > /brokers/topics/rain-load-test/partitions/52/state > (kafka.utils.ZkUtils$) > [2014-04-11 15:10:42.595-0600] INFO Partition [rain-load-test,52] on > broker 1: Cached zkVersion [55] not equal to that in zookeeper, skip > updating ISR (kafka.cluster.Partition) > > > Thanks, > > Balaji > > > > > -- -- Guozhang