Hi Jaikiran, I think Gwen was talking about contributing to ZkClient project:
https://github.com/sgroschupf/zkclient Guozhang On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com> wrote: > Hi Gwen, > > Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete > replacement. > > As for contributing to Zookeeper, yes that indeed in on my mind, but I > haven't yet had a chance to really look deeper into Zookeeper or get in > touch with their dev team to try and explain this potential improvement to > them. I have no objection to contributing this or something similar to > Zookeeper directly. I think I should be able to bring this up in the > Zookeeper dev forum, sometime soon in the next few weekends. > > -Jaikiran > > > On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote: > >> It looks like the new KafkaZkClient is a wrapper around ZkClient, but >> not a replacement. Did I get it right? >> >> I think a wrapper for ZkClient can be useful - for example KAFKA-1664 >> can also use one. >> >> However, I'm wondering why not contribute the fix directly to ZKClient >> project and ask for a release that contains the fix? >> This will benefit other users of the project who may also need a >> timeout (thats pretty basic...) >> >> As an alternative, if we don't want to collaborate with ZKClient for >> some reason, forking the project into Kafka will probably give us more >> control than wrappers and without much downside. >> >> Just a thought. >> >> Gwen >> >> >> >> >> >> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <jai.forums2...@gmail.com> >> wrote: >> >>> Neha, Ewen (and others), my initial attempt to solve this is uploaded >>> here >>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and >>> now >>> the server shuts down even when Zookeeper has gone down before the Kafka >>> server. >>> >>> I went with the approach of introducing a custom (enhanced) ZkClient >>> which >>> for now allows time outs to be optionally specified for certain >>> operations. >>> I intentionally haven't forced the use of this new KafkaZkClient all over >>> the code and instead for now have just used it in the KafkaServer. >>> >>> Does this patch look like something worth using? >>> >>> -Jaikiran >>> >>> >>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote: >>> >>>> Ewen is right. ZkClient APIs are blocking and the right fix for this >>>> seems >>>> to be patching ZkClient. At some point, if we find ourselves fiddling >>>> too >>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper >>>> client wrapper. >>>> >>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava >>>> <e...@confluent.io> >>>> wrote: >>>> >>>> Looks like a bug to me -- the underlying ZK library wraps a lot of >>>>> blocking >>>>> method implementations with waitUntilConnected() calls without any >>>>> timeouts. Ideally we could just add a version of >>>>> ZkUtils.getController() >>>>> with a timeout, but I don't see an easy way to accomplish that with >>>>> ZkClient. >>>>> >>>>> There's at least one other call to ZkUtils besides the one in the >>>>> stacktrace you gave that would cause the same issue, possibly more that >>>>> aren't directly called in that method. One ugly solution would be to >>>>> use >>>>> an >>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we >>>>> probably have other threads that could end up blocking in similar ways. >>>>> >>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the >>>>> issue. >>>>> >>>>> >>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai < >>>>> jai.forums2...@gmail.com> >>>>> wrote: >>>>> >>>>> The main culprit is this thread which goes into "forever retry >>>>>> connection >>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after >>>>>> zookeeper has already been shutdown. I have attached the complete >>>>>> thread >>>>>> dump, but I don't know if it will be delivered to the mailing list. >>>>>> >>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition >>>>>> [0x6ad69000] >>>>>> java.lang.Thread.State: TIMED_WAITING (parking) >>>>>> at sun.misc.Unsafe.park(Native Method) >>>>>> - parking to wait for <0x70a93368> (a >>>>>> java.util.concurrent.locks. >>>>>> AbstractQueuedSynchronizer$ConditionObject) >>>>>> at java.util.concurrent.locks.LockSupport.parkUntil( >>>>>> LockSupport.java:267) >>>>>> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ >>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130) >>>>>> at >>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636) >>>>>> at >>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619) >>>>>> at >>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615) >>>>>> at >>>>>> >>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679) >>>>> >>>>>> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) >>>>>> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) >>>>>> at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456) >>>>>> at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65) >>>>>> at kafka.server.KafkaServer.kafka$server$KafkaServer$$ >>>>>> controlledShutdown(KafkaServer.scala:194) >>>>>> at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$ >>>>>> sp(KafkaServer.scala:269) >>>>>> at kafka.utils.Utils$.swallow(Utils.scala:172) >>>>>> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) >>>>>> at kafka.utils.Utils$.swallowWarn(Utils.scala:45) >>>>>> at kafka.utils.Logging$class.swallow(Logging.scala:94) >>>>>> at kafka.utils.Utils$.swallow(Utils.scala:45) >>>>>> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269) >>>>>> at kafka.server.KafkaServerStartable.shutdown( >>>>>> KafkaServerStartable.scala:42) >>>>>> at kafka.Kafka$$anon$1.run(Kafka.scala:42) >>>>>> >>>>>> -Jaikiran >>>>>> >>>>>> >>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote: >>>>>> >>>>>> For a clean shutdown, the broker tries to talk to the controller and >>>>>>> >>>>>> also >>>>> >>>>>> issues reads to zookeeper. Possibly that is where it tries to >>>>>>> reconnect >>>>>>> >>>>>> to >>>>> >>>>>> zk. It will help to look at the thread dump. >>>>>>> >>>>>>> Thanks >>>>>>> Neha >>>>>>> >>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai < >>>>>>> jai.forums2...@gmail.com >>>>>>> wrote: >>>>>>> >>>>>>> I was just playing around with the RC2 of 0.8.2 and noticed that >>>>>>> if I >>>>>>> >>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since >>>>>>>> it >>>>>>>> goes >>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to >>>>>>>> kill >>>>>>>> the >>>>>>>> Kafka process to stop it. I tried it against trunk too and there >>>>>>>> too I >>>>>>>> see >>>>>>>> the same issue. Should I file a JIRA for this and see if I can come >>>>>>>> up >>>>>>>> with >>>>>>>> a patch? >>>>>>>> >>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying >>>>>>>> to >>>>>>>> reconnect. I've a thread dump too which shows that the other thread >>>>>>>> >>>>>>> which >>>>> >>>>>> is trying to complete a controlled shutdown of Kafka is blocked >>>>>>>> forever >>>>>>>> for >>>>>>>> the zookeeper to be up. I can attach it to the JIRA. >>>>>>>> >>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server >>>>>>>> >>>>>>> null, >>>>> >>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>> java.net.ConnectException: Connection refused >>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>> SocketChannelImpl.java:739) >>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>> ClientCnxn.java:1081) >>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server >>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using >>>>>>>> SASL >>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server >>>>>>>> >>>>>>> null, >>>>> >>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>> java.net.ConnectException: Connection refused >>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>> SocketChannelImpl.java:739) >>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>> ClientCnxn.java:1081) >>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server >>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using >>>>>>>> SASL >>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server >>>>>>>> >>>>>>> null, >>>>> >>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>> java.net.ConnectException: Connection refused >>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>> SocketChannelImpl.java:739) >>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>> ClientCnxn.java:1081) >>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server >>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using >>>>>>>> SASL >>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) >>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server >>>>>>>> >>>>>>> null, >>>>> >>>>>> unexpected error, closing socket connection and attempting reconnect >>>>>>>> (org.apache.zookeeper.ClientCnxn) >>>>>>>> java.net.ConnectException: Connection refused >>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( >>>>>>>> SocketChannelImpl.java:739) >>>>>>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( >>>>>>>> ClientCnxnSocketNIO.java:361) >>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( >>>>>>>> ClientCnxn.java:1081) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -Jaikiran >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>> Thanks, >>>>> Ewen >>>>> >>>>> >>>> > -- -- Guozhang