I see, that makes sense. Thanks a lot for clarifying!

-Neha


On Fri, Mar 21, 2014 at 11:01 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote:

> Let me clarify the situation. I forgot to mention that my case might not be
> general one because Netflix is using Apache Curator as the main zookeeper
> client and ZkClient in Kafka should be bridged to Apache Curator, so the
> behavior I have seen might not be general one.
>
> Kafka's ZKSessionExpireListener.handleNewSession() is reinstating all
> ephemeral nodes and watchers but handleNewSession() was not kicked in my
> case. So, I created Netflix internal version of ZkClient to replace
> ephemeral node creation and watcher reinstating.
>
> I have a plan to remove all external dependency from Kafka soon and I might
> be able to mention more about Kafka's zk resiliency.
>
>
> On Fri, Mar 21, 2014 at 7:51 AM, Jun Rao <jun...@gmail.com> wrote:
>
> > Hi, Jae,
> >
> > My understanding is that ephemeral nodes can only be lost on ZK session
> > expiration. Are you saying that there are other cases when ephemeral
> nodes
> > can be lost?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Mar 20, 2014 at 9:52 PM, Bae, Jae Hyeon <metac...@gmail.com>
> > wrote:
> >
> > > This issue is zookeeper resiliency.
> > >
> > > What I have done is, ephemeral node creation is replaced by Apache
> > > Curator's PersistentEphemeralNode recipe, to reinstate ephemeral nodes
> > > after zookeeper blip. Also, all watchers also should be reinstated.
> Kafka
> > > internally only handles session expired event but there are many bad
> > > situations of zookeepers where Kafka can go haywire.
> > >
> > >
> > >
> > >
> > > On Thu, Mar 20, 2014 at 9:23 PM, Neha Narkhede <
> neha.narkh...@gmail.com
> > > >wrote:
> > >
> > > > 1.  What does 'zookeeper state changed (Expired)' mean?
> > > >
> > > > 2.  Has anyone seen issues like this before?  Where zookeeper
> > connections
> > > > are flaky enough to cause leader elections?
> > > >
> > > > It means zookeeper expired the session. The most common reason for
> this
> > > is
> > > > client side GC (in your case, client is the Kafka broker) though if
> > your
> > > > zookeeper setup is flaky, this might even point to a server side I/O
> > > > bottleneck or GC. Could you inspect the broker and zookeeper's gc
> logs?
> > > >
> > > >
> > > > On Thu, Mar 20, 2014 at 3:26 PM, Andrew Otto <o...@wikimedia.org>
> > wrote:
> > > >
> > > > > Hm, just saw something a little fishy.
> > > > >
> > > > > (In the following logs, analytics1021 (id 21) and analytics1022 (id
> > 22)
> > > > > are Brokers and analytics1023,analytics1024,analytics1025 are
> > > > Zookeepers.)
> > > > >
> > > > > At 2014-03-20 21:12:26, analytics1021 lost its connection to
> > zookeeper.
> > > > It
> > > > > reconnected to analytics1023, but as it was doing so, it logged a
> > > > > 'zookeeper state changed (Expired)' message, which apparently
> caused
> > > the
> > > > > newly opened socket to close again.  Am I reading that right?
> > > > >
> > > > > In the time that it took analytics1021 to finally re-establish a
> > > > zookeeper
> > > > > connection, analytics1022 had become the leader for all partitions.
> >  I
> > > > can
> > > > > rebalance the leaders, but I'd prefer if this didn't happen in the
> > > first
> > > > > place.
> > > > >
> > > > > 1.  What does 'zookeeper state changed (Expired)' mean?
> > > > > 2.  Has anyone seen issues like this before?  Where zookeeper
> > > connections
> > > > > are flaky enough to cause leader elections?
> > > > >
> > > > > Thanks!
> > > > > -Andrew Otto
> > > > >
> > > > >
> > > > >
> > > > > [2014-03-20 21:12:26,427] 685698013
> > > > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > > > org.apache.zookeeper.ClientCnxn - Client session timed out, have
> not
> > > > heard
> > > > > from server in 10067ms for sessionid 0x1425d903a3f1f89, closing
> > socket
> > > > > connection and attempting reconnect
> > > > > [2014-03-20 21:12:26,528] 685698114 [main-EventThread] INFO
> > > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed
> (Disconnected)
> > > > > [2014-03-20 21:12:27,252] 685698838
> > > > > [main-SendThread(analytics1025.eqiad.wmnet:2181)] INFO
> > > > > org.apache.zookeeper.ClientCnxn - Opening socket connection to
> server
> > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > > > [2014-03-20 21:12:27,271] 685698857
> > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > org.apache.zookeeper.ClientCnxn - Socket connection established to
> > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
> > > > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired)
> > > > > [2014-03-20 21:12:27,273] 685698859
> > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper
> > > > service,
> > > > > session 0x1425d903a3f1f89 has expired, closing socket connection
> > > > > [2014-03-20 21:12:27,273] 685698859 [main-EventThread] INFO
> > > > > org.apache.zookeeper.ZooKeeper - Initiating client connection,
> > > > >
> > > >
> > >
> >
> connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad
> > > > > sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@63203b59
> > > > > [2014-03-20 21:12:27,290] 685698876
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > INFO kafka.server.KafkaZooKeeper - re-registering broker info in ZK
> > for
> > > > > broker 21
> > > > > [2014-03-20 21:12:27,290] 685698876 [main-EventThread] INFO
> > > > > org.apache.zookeeper.ClientCnxn - EventThread shut down
> > > > > [2014-03-20 21:12:27,290] 685698876 [main-SendThread()] INFO
> > > > > org.apache.zookeeper.ClientCnxn - Opening socket connection to
> server
> > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181
> > > > > [2014-03-20 21:12:27,292] 685698878
> > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > org.apache.zookeeper.ClientCnxn - Socket connection established to
> > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating session
> > > > > [2014-03-20 21:12:27,313] 685698899
> > > > > [main-SendThread(analytics1023.eqiad.wmnet:2181)] INFO
> > > > > org.apache.zookeeper.ClientCnxn - Session establishment complete on
> > > > server
> > > > > analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid =
> > > > > 0xff429b4201dd2352, negotiated timeout = 6000
> > > > > [2014-03-20 21:12:27,314] 685698900 [main-EventThread] INFO
> > > > > org.I0Itec.zkclient.ZkClient - zookeeper state changed
> > (SyncConnected)
> > > > > [2014-03-20 21:12:27,324] 685698910
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > INFO kafka.server.KafkaZooKeeper - done re-registering broker
> > > > > [2014-03-20 21:12:27,324] 685698910
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > INFO kafka.server.KafkaZooKeeper - Subscribing to /brokers/topics
> > path
> > > to
> > > > > watch for new topics
> > > > > [2014-03-20 21:12:27,370] 685698956
> > > > >
> > > >
> > >
> >
> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
> > > > > INFO kafka.server.ZookeeperLeaderElector$LeaderChangeListener - New
> > > > leader
> > > > > is 22
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to