Hi We have a 3 node zk ensemble as well as 3 node Kafka Cluster. They both are hosted on the same 3 VMs.
Before Restart 1. We were on Kafka 0.10.2.1 After Restart 1. We moved to Kafka 1.1 We observe that Kafkas report leadership issues, and for lot of partitions Leader is -1. I see some logs in ZK that mainly point towards some connectivity issue around restart time. *We are stuck on this one for a while now, and neither rolling restart of ZK is helping. Can you please help or point us how we can debug this.* *2018-05-11_17:20:49.00305 2018-05-11 17:20:49,002 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@600] - Notification: 1 (message format version), 1 (n.leader), 0x200000112 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state) 2018-05-11_17:20:49.01201 2018-05-11 17:20:49,010 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager@400] - Cannot open channel to 2 at election address /1.1.1.143:3888 <http://1.1.1.143:3888> 2018-05-11_17:20:49.01203 java.net.ConnectException: Connection refused 2018-05-11_17:20:49.01203 at java.net.PlainSocketImpl.socketConnect(Native Method) 2018-05-11_17:20:49.01203 at java.net <http://java.net>.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) 2018-05-11_17:20:49.01203 at java.net <http://java.net>.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) 2018-05-11_17:20:49.01204 at java.net <http://java.net>.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) 2018-05-11_17:20:49.01204 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 2018-05-11_17:20:49.01204 at java.net.Socket.connect(Socket.java:589) 2018-05-11_17:20:49.01204 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:381) 2018-05-11_17:20:49.01204 at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:354) 2018-05-11_17:20:49.01205 at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452) 2018-05-11_17:20:49.01205 at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433) 2018-05-11_17:20:49.01206 at java.lang.Thread.run(Thread.java:745)* Rag