I tried with persisten volume as Christian suggested and it works great for me. 
Thanks !

Btw I need to explore the zookeeper cluster solution as well.

Paolo.

Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor 
Twitter : @ppatierno
Linkedin : paolopatierno
Blog : DevExperience

> Date: Tue, 10 May 2016 15:57:10 -0700
> Subject: Re: Zookeeper dies ... Kafka server unable to connect
> From: christian.po...@gmail.com
> To: users@kafka.apache.org
> 
> Or retry with a volumeMount/persistentVolume for your single ZK pod.
> 
> On Tue, May 10, 2016 at 9:01 AM, Paolo Patierno <ppatie...@live.com> wrote:
> 
> > Ok .. thanks.
> > I'll retry with a zookeeper cluster.
> >
> > Paolo.
> >
> > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
> > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
> > Twitter : @ppatierno
> > Linkedin : paolopatierno
> > Blog : DevExperience
> >
> > > Date: Tue, 10 May 2016 17:59:16 +0200
> > > From: ra...@gruchalski.com
> > > To: ppatie...@live.com; users@kafka.apache.org
> > > Subject: RE: Zookeeper dies ... Kafka server unable to connect
> > >
> > > Kafka is expecting the state to be there when the zookeeper comes back.
> > One way to protect yourself from what you see happening, is to have a
> > zookeeper quorum. Run a cluster of 3 zookeepers, then repeat your exercise.
> > >
> > > Kafka will continue to work absolutely fine. Just remember, with 3 ZK
> > instances, you can only kill one at a time.
> > > –
> > > Best regards,
> > > Radek Gruchalski
> > > ra...@gruchalski.com
> > > de.linkedin.com/in/radgruchalski
> > >
> > > Confidentiality:
> > > This communication is intended for the above-named person and may be
> > confidential and/or legally privileged.
> > > If it has come to you in error you must take no action based on it, nor
> > must you copy or show it to anyone; please delete/destroy and inform the
> > sender immediately.
> > >
> > > On May 10, 2016 at 5:56:58 PM, Paolo Patierno (ppatie...@live.com)
> > wrote:
> > >
> > > Yes correct ... the new restarted zookeeper instance is completely new
> > ... it has no information about previous topics and brokers of course.
> > >
> > > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
> > > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
> > > Twitter : @ppatierno
> > > Linkedin : paolopatierno
> > > Blog : DevExperience
> > >
> > > > Date: Tue, 10 May 2016 17:55:10 +0200
> > > > From: ra...@gruchalski.com
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Zookeeper dies ... Kafka server unable to connect
> > > >
> > > > Ah, but your retarted container does not have any data Kafka recorded
> > previously. Correct?
> > > > –
> > > > Best regards,
> > > > Radek Gruchalski
> > > > ra...@gruchalski.com
> > > > de.linkedin.com/in/radgruchalski
> > > >
> > > > Confidentiality:
> > > > This communication is intended for the above-named person and may be
> > confidential and/or legally privileged.
> > > > If it has come to you in error you must take no action based on it,
> > nor must you copy or show it to anyone; please delete/destroy and inform
> > the sender immediately.
> > > >
> > > > On May 10, 2016 at 5:54:09 PM, Paolo Patierno (ppatie...@live.com)
> > wrote:
> > > >
> > > > This is what Kubernetes says me ...
> > > >
> > > > Name: zookeeper
> > > > Namespace: default
> > > > Labels: <none>
> > > > Selector: name=zookeeper
> > > > Type: ClusterIP
> > > > IP: 10.0.0.184
> > > > Port: zookeeper 2181/TCP
> > > > Endpoints: 172.17.0.4:2181
> > > > Session Affinity: None
> > > >
> > > > So the address is always 10.0.0.184.
> > > >
> > > > From the log I understand that the creash is released to the zookeeper
> > pod I closed ... so kafka server lost connection to it.
> > > > Starting from there they should be the attempts to connect to the new
> > zookeeper that is up and running with same IP address as the previous one.
> > > >
> > > > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
> > > > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
> > > > Twitter : @ppatierno
> > > > Linkedin : paolopatierno
> > > > Blog : DevExperience
> > > >
> > > > > Date: Tue, 10 May 2016 17:49:59 +0200
> > > > > From: ra...@gruchalski.com
> > > > > To: users@kafka.apache.org
> > > > > Subject: Re: Zookeeper dies ... Kafka server unable to connect
> > > > >
> > > > > Are you sure you’re getting the same IP address?
> > > > > Regarding zookeeper connection being closed, is kubernetes doing a
> > soft shutdown of your container? If so, zookeeper is asked politely to stop.
> > > > > –
> > > > > Best regards,
> > > > > Radek Gruchalski
> > > > > radek@gruchalski.commailto:ra...@gruchalski.com
> > > > > de.linkedin.com/in/radgruchalski
> > > > > +4917685656526
> > > > >
> > > > > Confidentiality:
> > > > > This communication is intended for the above-named person and may be
> > confidential and/or legally privileged.
> > > > > If it has come to you in error you must take no action based on it,
> > nor must you copy or show it to anyone; please delete/destroy and inform
> > the sender immediately.
> > > > >
> > > > > On May 10, 2016 at 5:47:24 PM, Paolo Patierno (ppatie...@live.com)
> > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > experiencing with Kafka on Kubernetes I have the following error on
> > Kafka server reconnection ...
> > > > >
> > > > > A cluster with one zookeeper and two kafka server ... I turn off the
> > zookeeper pod but kubernetes restart it and guaratees the same IP address
> > for it but the kafka server starts to retry connection failing with
> > following trace :
> > > > >
> > > > > [2016-05-10 15:40:55,046] WARN Session 0x1549b308dd20002 for server
> > 10.0.0.184/10.0.0.184:2181, unexpected error, closing socket connection
> > and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> > > > > java.io.IOException: Connection reset by peer
> > > > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > > > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > > > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> > > > > at
> > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
> > > > > at
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> > > > > at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> > > > > [2016-05-10 15:40:55,149] INFO zookeeper state changed
> > (Disconnected) (org.I0Itec.zkclient.ZkClient)
> > > > > [2016-05-10 15:40:57,093] INFO Opening socket connection to server
> > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL
> > (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:40:57,093] INFO Socket connection established to
> > 10.0.0.184/10.0.0.184:2181, initiating session
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:40:57,158] INFO Unable to read additional data from
> > server sessionid 0x1549b308dd20002, likely server has closed socket,
> > closing socket connection and attempting reconnect
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:40:58,936] INFO Opening socket connection to server
> > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL
> > (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:40:58,936] INFO Socket connection established to
> > 10.0.0.184/10.0.0.184:2181, initiating session
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:40:58,937] INFO Unable to read additional data from
> > server sessionid 0x1549b308dd20002, likely server has closed socket,
> > closing socket connection and attempting reconnect
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:00,845] INFO Opening socket connection to server
> > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL
> > (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:00,845] INFO Socket connection established to
> > 10.0.0.184/10.0.0.184:2181, initiating session
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:00,846] INFO Unable to read additional data from
> > server sessionid 0x1549b308dd20002, likely server has closed socket,
> > closing socket connection and attempting reconnect
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:02,071] INFO Opening socket connection to server
> > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL
> > (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:02,071] INFO Socket connection established to
> > 10.0.0.184/10.0.0.184:2181, initiating session
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:02,072] INFO Unable to read additional data from
> > server sessionid 0x1549b308dd20002, likely server has closed socket,
> > closing socket connection and attempting reconnect
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:03,336] INFO Opening socket connection to server
> > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL
> > (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:03,336] INFO Socket connection established to
> > 10.0.0.184/10.0.0.184:2181, initiating session
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:03,337] INFO Unable to read additional data from
> > server sessionid 0x1549b308dd20002, likely server has closed socket,
> > closing socket connection and attempting reconnect
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:05,121] INFO Opening socket connection to server
> > 10.0.0.184/10.0.0.184:2181. Will not attempt to authenticate using SASL
> > (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:05,121] INFO Socket connection established to
> > 10.0.0.184/10.0.0.184:2181, initiating session
> > (org.apache.zookeeper.ClientCnxn)
> > > > > [2016-05-10 15:41:05,122] INFO Unable to read additional data from
> > server sessionid 0x1549b308dd20002, likely server has closed socket,
> > closing socket connection and attempting reconnect
> > (org.apache.zookeeper.ClientCnxn)
> > > > >
> > > > > You can see when the first zookeeper dies and connection is lost ...
> > and all the retries by kafka server in order to connect to the new one
> > (same IP, same port).
> > > > >
> > > > > Why the zookeeper server closes the connection (I can see FIN ACK
> > frames on Wireshark)
> > > > >
> > > > > Thanks,
> > > > > Paolo.
> > > > >
> > > > > Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
> > > > > Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
> > > > > Twitter : @ppatierno
> > > > > Linkedin : paolopatierno
> > > > > Blog : DevExperience
> >
> >
> 
> 
> 
> -- 
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
                                          

Reply via email to