Kafka
internally only handles session expired event but there are many bad
situations of zookeepers where Kafka can go haywire.

Interesting. Would you mind giving more details about this? Kafka does
handle events other than session expirations through the zkclient library.
Also, Zookeeper only loses ephemeral nodes on session expirations. It does
lose ephemeral nodes in other circumstances, when the cluster starves on
I/O, but those expose zookeeper corner case bugs :-)

Thanks,
Neha


On Fri, Mar 21, 2014 at 7:51 AM, Jun Rao <jun...@gmail.com> wrote:

> Hi, Jae,
>
> My understanding is that ephemeral nodes can only be lost on ZK session
> expiration. Are you saying that there are other cases when ephemeral nodes
> can be lost?
>
> Thanks,
>
> Jun
>
>
> On Thu, Mar 20, 2014 at 9:52 PM, Bae, Jae Hyeon <metac...@gmail.com>
> wrote:
>
> > This issue is zookeeper resiliency.
> >
> > What I have done is, ephemeral node creation is replaced by Apache
> > Curator's PersistentEphemeralNode recipe, to reinstate ephemeral nodes
> > after zookeeper blip. Also, all watchers also should be reinstated. Kafka
> > internally only handles session expired event but there are many bad
> > situations of zookeepers where Kafka can go haywire.
> >
> >
> >
> >
> > On Thu, Mar 20, 2014 at 9:23 PM, Neha Narkhede <neha.narkh...@gmail.com
> > >wrote:
> >
> > > 1.  What does 'zookeeper state changed (Expired)' mean?
> > >
> > > 2.  Has anyone seen issues like this before?  Where zookeeper
> connections
> > > are flaky enough to cause leader elections?
> > >
> > > It means zookeeper expired the session. The most common reason for this
> > is
> > > client side GC (in your case, client is the Kafka broker) though if
> your
> > > zookeeper setup is flaky, this might even point to a server side I/O
> > > bottleneck or GC. Could you inspect the broker and zookeeper's gc logs?
> > >
> > >
> > > On Thu, Mar 20, 2014 at 3:26 PM, Andrew Otto <o...@wikimedia.org>
> wrote:
> > >
> > > > Hm, just saw something a little fishy.
> > > >
> > > > (In the following logs, analytics1021 (id 21) and analytics1022 (id
> 22)
> > > > are Brokers and analytics1023,analytics1024,analytics1025 are
> > > Zookeepers.)
> > > >
> > > > At 2014-03-20 21:12:26, analytics1021 lost its connection to
> zookeeper.
> > > It
> > > > reconnected to analytics1023, but as it was doing so, it logged a
> > > > 'zookeeper state changed (Expired)' message, which apparently caused
> > the
> > > > newly opened socket to close again.  Am I reading that right?
> > > >
> > > > In the time that it took analytics1021 to finally re-establish a
> > > zookeeper
> > > > connection, analytics1022 had become the leader for all partitions.
>  I
> > > can
> > > > rebalance the leaders, but I'd prefer if this didn't happen in the
> > first
> > > > place.
> > > >
> > > > 1.  What does 'zookeeper state changed (Expired)' mean?
> > > > 2.  Has anyone seen issues like this before?  Where zookeeper
> > connections
> > > > are flaky enough to cause leader elections?
> > > >
> > > > Thanks!
> > > > -Andrew Otto
> > > >
> > > >
> > > >
> > > > [2014-03-20 21:12:26,427] 685698013
> > > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > > org.apache.zookeeper.ClientCnxn - Client session timed out, have not
> > > heard
> > > > from server in 10067ms for sessionid 0x1425d903a3f1f89, closing
> socket
> > > > connection and attempting reconnect
> > > > [2014-03-20 21:12:26,528] 685698114 [main-EventThread] INFO
> > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed (Disconnected)
> > > > [2014-03-20 21:12:27,252] 685698838
> > > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > > org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> > > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > > [2014-03-20 21:12:27,271] 685698857
> > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > org.apache.zookeeper.ClientCnxn - Socket connection established to
> > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
> > > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired)
> > > > [2014-03-20 21:12:27,273] 685698859
> > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper
> > > service,
> > > > session 0x1425d903a3f1f89 has expired, closing socket connection
> > > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > > org.apache.zookeeper.ZooKeeper - Initiating client connection,
> > > >
> > >
> >
> connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad
> > > > sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@63203b59
> > > > [2014-03-20 21:12:27,290] 685698876
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > INFO kafka.server.KafkaZooKeeper - re-registering broker info in ZK
> for
> > > > broker 21
> > > > [2014-03-20 21:12:27,290] 685698876 [main-EventThread] INFO
> > > > org.apache.zookeeper.ClientCnxn - EventThread shut down
> > > > [2014-03-20 21:12:27,290] 685698876 [main-SendThread()] INFO
> > > > org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> > > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > > [2014-03-20 21:12:27,292] 685698878
> > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > org.apache.zookeeper.ClientCnxn - Socket connection established to
> > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
> > > > [2014-03-20 21:12:27,313] 685698899
> > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > org.apache.zookeeper.ClientCnxn - Session establishment complete on
> > > server
> > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid =
> > > > 0xff429b4201dd2352, negotiated timeout = 6000
> > > > [2014-03-20 21:12:27,314] 685698900 [main-EventThread] INFO
> > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed
> (SyncConnected)
> > > > [2014-03-20 21:12:27,324] 685698910
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > INFO kafka.server.KafkaZooKeeper - done re-registering broker
> > > > [2014-03-20 21:12:27,324] 685698910
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > INFO kafka.server.KafkaZooKeeper - Subscribing to /brokers/topics
> path
> > to
> > > > watch for new topics
> > > > [2014-03-20 21:12:27,370] 685698956
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener - New
> > > leader
> > > > is 22
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to