Hi, Jae,

My understanding is that ephemeral nodes can only be lost on ZK session
expiration. Are you saying that there are other cases when ephemeral nodes
can be lost?

Thanks,

Jun


On Thu, Mar 20, 2014 at 9:52 PM, Bae, Jae Hyeon <metac...@gmail.com> wrote:

> This issue is zookeeper resiliency.
>
> What I have done is, ephemeral node creation is replaced by Apache
> Curator's PersistentEphemeralNode recipe, to reinstate ephemeral nodes
> after zookeeper blip. Also, all watchers also should be reinstated. Kafka
> internally only handles session expired event but there are many bad
> situations of zookeepers where Kafka can go haywire.
>
>
>
>
> On Thu, Mar 20, 2014 at 9:23 PM, Neha Narkhede <neha.narkh...@gmail.com
> >wrote:
>
> > 1.  What does 'zookeeper state changed (Expired)' mean?
> >
> > 2.  Has anyone seen issues like this before?  Where zookeeper connections
> > are flaky enough to cause leader elections?
> >
> > It means zookeeper expired the session. The most common reason for this
> is
> > client side GC (in your case, client is the Kafka broker) though if your
> > zookeeper setup is flaky, this might even point to a server side I/O
> > bottleneck or GC. Could you inspect the broker and zookeeper's gc logs?
> >
> >
> > On Thu, Mar 20, 2014 at 3:26 PM, Andrew Otto <o...@wikimedia.org> wrote:
> >
> > > Hm, just saw something a little fishy.
> > >
> > > (In the following logs, analytics1021 (id 21) and analytics1022 (id 22)
> > > are Brokers and analytics1023,analytics1024,analytics1025 are
> > Zookeepers.)
> > >
> > > At 2014-03-20 21:12:26, analytics1021 lost its connection to zookeeper.
> > It
> > > reconnected to analytics1023, but as it was doing so, it logged a
> > > 'zookeeper state changed (Expired)' message, which apparently caused
> the
> > > newly opened socket to close again.  Am I reading that right?
> > >
> > > In the time that it took analytics1021 to finally re-establish a
> > zookeeper
> > > connection, analytics1022 had become the leader for all partitions.  I
> > can
> > > rebalance the leaders, but I'd prefer if this didn't happen in the
> first
> > > place.
> > >
> > > 1.  What does 'zookeeper state changed (Expired)' mean?
> > > 2.  Has anyone seen issues like this before?  Where zookeeper
> connections
> > > are flaky enough to cause leader elections?
> > >
> > > Thanks!
> > > -Andrew Otto
> > >
> > >
> > >
> > > [2014-03-20 21:12:26,427] 685698013
> > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > org.apache.zookeeper.ClientCnxn - Client session timed out, have not
> > heard
> > > from server in 10067ms for sessionid 0x1425d903a3f1f89, closing socket
> > > connection and attempting reconnect
> > > [2014-03-20 21:12:26,528] 685698114 [main-EventThread] INFO
> > > org.I0Itec.zkclient.ZkClient - zookeeper state changed (Disconnected)
> > > [2014-03-20 21:12:27,252] 685698838
> > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > [2014-03-20 21:12:27,271] 685698857
> > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > org.apache.zookeeper.ClientCnxn - Socket connection established to
> > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
> > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired)
> > > [2014-03-20 21:12:27,273] 685698859
> > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper
> > service,
> > > session 0x1425d903a3f1f89 has expired, closing socket connection
> > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > org.apache.zookeeper.ZooKeeper - Initiating client connection,
> > >
> >
> connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad
> > > sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@63203b59
> > > [2014-03-20 21:12:27,290] 685698876
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > INFO kafka.server.KafkaZooKeeper - re-registering broker info in ZK for
> > > broker 21
> > > [2014-03-20 21:12:27,290] 685698876 [main-EventThread] INFO
> > > org.apache.zookeeper.ClientCnxn - EventThread shut down
> > > [2014-03-20 21:12:27,290] 685698876 [main-SendThread()] INFO
> > > org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > [2014-03-20 21:12:27,292] 685698878
> > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > org.apache.zookeeper.ClientCnxn - Socket connection established to
> > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
> > > [2014-03-20 21:12:27,313] 685698899
> > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > org.apache.zookeeper.ClientCnxn - Session establishment complete on
> > server
> > > analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid =
> > > 0xff429b4201dd2352, negotiated timeout = 6000
> > > [2014-03-20 21:12:27,314] 685698900 [main-EventThread] INFO
> > > org.I0Itec.zkclient.ZkClient - zookeeper state changed (SyncConnected)
> > > [2014-03-20 21:12:27,324] 685698910
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > INFO kafka.server.KafkaZooKeeper - done re-registering broker
> > > [2014-03-20 21:12:27,324] 685698910
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > INFO kafka.server.KafkaZooKeeper - Subscribing to /brokers/topics path
> to
> > > watch for new topics
> > > [2014-03-20 21:12:27,370] 685698956
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener - New
> > leader
> > > is 22
> > >
> > >
> > >
> >
>

Reply via email to