Ok .. thanks. I'll retry with a zookeeper cluster. Paolo.
Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor Twitter : @ppatierno Linkedin : paolopatierno Blog : DevExperience > Date: Tue, 10 May 2016 17:59:16 +0200 > From: ra...@gruchalski.com > To: ppatie...@live.com; users@kafka.apache.org > Subject: RE: Zookeeper dies ... Kafka server unable to connect > > Kafka is expecting the state to be there when the zookeeper comes back. One > way to protect yourself from what you see happening, is to have a zookeeper > quorum. Run a cluster of 3 zookeepers, then repeat your exercise. > > Kafka will continue to work absolutely fine. Just remember, with 3 ZK > instances, you can only kill one at a time. > – > Best regards, > Radek Gruchalski > ra...@gruchalski.com > de.linkedin.com/in/radgruchalski > > Confidentiality: > This communication is intended for the above-named person and may be > confidential and/or legally privileged. > If it has come to you in error you must take no action based on it, nor must > you copy or show it to anyone; please delete/destroy and inform the sender > immediately. > > On May 10, 2016 at 5:56:58 PM, Paolo Patierno (ppatie...@live.com) wrote: > > Yes correct ... the new restarted zookeeper instance is completely new ... it > has no information about previous topics and brokers of course. > > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor > Twitter : @ppatierno > Linkedin : paolopatierno > Blog : DevExperience > > > Date: Tue, 10 May 2016 17:55:10 +0200 > > From: ra...@gruchalski.com > > To: users@kafka.apache.org > > Subject: RE: Zookeeper dies ... Kafka server unable to connect > > > > Ah, but your retarted container does not have any data Kafka recorded > > previously. Correct? > > – > > Best regards, > > Radek Gruchalski > > ra...@gruchalski.com > > de.linkedin.com/in/radgruchalski > > > > Confidentiality: > > This communication is intended for the above-named person and may be > > confidential and/or legally privileged. > > If it has come to you in error you must take no action based on it, nor > > must you copy or show it to anyone; please delete/destroy and inform the > > sender immediately. > > > > On May 10, 2016 at 5:54:09 PM, Paolo Patierno (ppatie...@live.com) wrote: > > > > This is what Kubernetes says me ... > > > > Name: zookeeper > > Namespace: default > > Labels: <none> > > Selector: name=zookeeper > > Type: ClusterIP > > IP: 10.0.0.184 > > Port: zookeeper 2181/TCP > > Endpoints: 172.17.0.4:2181 > > Session Affinity: None > > > > So the address is always 10.0.0.184. > > > > From the log I understand that the creash is released to the zookeeper pod > > I closed ... so kafka server lost connection to it. > > Starting from there they should be the attempts to connect to the new > > zookeeper that is up and running with same IP address as the previous one. > > > > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat > > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor > > Twitter : @ppatierno > > Linkedin : paolopatierno > > Blog : DevExperience > > > > > Date: Tue, 10 May 2016 17:49:59 +0200 > > > From: ra...@gruchalski.com > > > To: users@kafka.apache.org > > > Subject: Re: Zookeeper dies ... Kafka server unable to connect > > > > > > Are you sure you’re getting the same IP address? > > > Regarding zookeeper connection being closed, is kubernetes doing a soft > > > shutdown of your container? If so, zookeeper is asked politely to stop. > > > – > > > Best regards, > > > Radek Gruchalski > > > radek@gruchalski.commailto:ra...@gruchalski.com > > > de.linkedin.com/in/radgruchalski > > > +4917685656526 > > > > > > Confidentiality: > > > This communication is intended for the above-named person and may be > > > confidential and/or legally privileged. > > > If it has come to you in error you must take no action based on it, nor > > > must you copy or show it to anyone; please delete/destroy and inform the > > > sender immediately. > > > > > > On May 10, 2016 at 5:47:24 PM, Paolo Patierno (ppatie...@live.com) wrote: > > > > > > > > > Hi all, > > > > > > experiencing with Kafka on Kubernetes I have the following error on Kafka > > > server reconnection ... > > > > > > A cluster with one zookeeper and two kafka server ... I turn off the > > > zookeeper pod but kubernetes restart it and guaratees the same IP address > > > for it but the kafka server starts to retry connection failing with > > > following trace : > > > > > > [2016-05-10 15:40:55,046] WARN Session 0x1549b308dd20002 for server > > > 10.0.0.184/10.0.0.184:2181, unexpected error, closing socket connection > > > and attempting reconnect (org.apache.zookeeper.ClientCnxn) > > > java.io.IOException: Connection reset by peer > > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > > > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) > > > at > > > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) > > > > > > at > > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) > > > > > > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) > > > [2016-05-10 15:40:55,149] INFO zookeeper state changed (Disconnected) > > > (org.I0Itec.zkclient.ZkClient) > > > [2016-05-10 15:40:57,093] INFO Opening socket connection to server > > > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL > > > (unknown error) (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:40:57,093] INFO Socket connection established to > > > 10.0.0.184/10.0.0.184:2181, initiating session > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:40:57,158] INFO Unable to read additional data from server > > > sessionid 0x1549b308dd20002, likely server has closed socket, closing > > > socket connection and attempting reconnect > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:40:58,936] INFO Opening socket connection to server > > > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL > > > (unknown error) (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:40:58,936] INFO Socket connection established to > > > 10.0.0.184/10.0.0.184:2181, initiating session > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:40:58,937] INFO Unable to read additional data from server > > > sessionid 0x1549b308dd20002, likely server has closed socket, closing > > > socket connection and attempting reconnect > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:00,845] INFO Opening socket connection to server > > > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL > > > (unknown error) (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:00,845] INFO Socket connection established to > > > 10.0.0.184/10.0.0.184:2181, initiating session > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:00,846] INFO Unable to read additional data from server > > > sessionid 0x1549b308dd20002, likely server has closed socket, closing > > > socket connection and attempting reconnect > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:02,071] INFO Opening socket connection to server > > > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL > > > (unknown error) (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:02,071] INFO Socket connection established to > > > 10.0.0.184/10.0.0.184:2181, initiating session > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:02,072] INFO Unable to read additional data from server > > > sessionid 0x1549b308dd20002, likely server has closed socket, closing > > > socket connection and attempting reconnect > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:03,336] INFO Opening socket connection to server > > > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL > > > (unknown error) (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:03,336] INFO Socket connection established to > > > 10.0.0.184/10.0.0.184:2181, initiating session > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:03,337] INFO Unable to read additional data from server > > > sessionid 0x1549b308dd20002, likely server has closed socket, closing > > > socket connection and attempting reconnect > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:05,121] INFO Opening socket connection to server > > > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL > > > (unknown error) (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:05,121] INFO Socket connection established to > > > 10.0.0.184/10.0.0.184:2181, initiating session > > > (org.apache.zookeeper.ClientCnxn) > > > [2016-05-10 15:41:05,122] INFO Unable to read additional data from server > > > sessionid 0x1549b308dd20002, likely server has closed socket, closing > > > socket connection and attempting reconnect > > > (org.apache.zookeeper.ClientCnxn) > > > > > > You can see when the first zookeeper dies and connection is lost ... and > > > all the retries by kafka server in order to connect to the new one (same > > > IP, same port). > > > > > > Why the zookeeper server closes the connection (I can see FIN ACK frames > > > on Wireshark) > > > > > > Thanks, > > > Paolo. > > > > > > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat > > > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor > > > Twitter : @ppatierno > > > Linkedin : paolopatierno > > > Blog : DevExperience