curator handles sasl connection https://issues.apache.org/jira/browse/KAFKA-1695
On Wed, Feb 4, 2015, at 06:10 AM, Jaikiran Pai wrote: > FWIW - the ZkClient project team have merged the pull request that I had > submitted to allow for timeouts to operations > https://github.com/sgroschupf/zkclient/pull/29. I heard from Johannes > (from the ZkClient project team) that they don't have any specific > release date in mind but are willing to release a new version if/when we > need one. > > -Jaikiran > > On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote: > > So I think the current plan is: > > 1. Add timeout in zkclient > > 2. Ask zkclient to release new version (we need it for few other things too) > > 3. Rebase on new zkclient > > 4. Fix this jira and the few others than were waiting for the new zkclient > > > > Does that make sense? > > > > Gwen > > > > On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <jai.forums2...@gmail.com> > > wrote: > >> I just heard back from Stefan, who manages the ZkClient repo and he seems > >> to > >> be open to have these changes be part of ZkClient project. I'll be creating > >> a pull request for that project to have it reviewed and merged. Although I > >> haven't heard of exact release plans, Stefan's reply did indicate that the > >> project could be released after this change is merged. > >> > >> -Jaikiran > >> > >> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote: > >>> Thanks for pointing to that repo! > >>> > >>> I just had a look at it and it appears that the project isn't much active > >>> (going by the lack of activity). The latest contribution is from Gwen and > >>> that was around 3 months back. I haven't found release plans for that > >>> project or a place to ask about it (filing an issue doesn't seem right to > >>> ask this question). So I'll get in touch with the repo owner and see what > >>> his plans for the project are. > >>> > >>> -Jaikiran > >>> > >>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote: > >>>> I did! > >>>> > >>>> Thanks for clarifying :) > >>>> > >>>> The client that is part of Zookeeper itself actually does support > >>>> timeouts. > >>>> > >>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wangg...@gmail.com> wrote: > >>>>> Hi Jaikiran, > >>>>> > >>>>> I think Gwen was talking about contributing to ZkClient project: > >>>>> > >>>>> https://github.com/sgroschupf/zkclient > >>>>> > >>>>> Guozhang > >>>>> > >>>>> > >>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hi Gwen, > >>>>>> > >>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete > >>>>>> replacement. > >>>>>> > >>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I > >>>>>> haven't yet had a chance to really look deeper into Zookeeper or get in > >>>>>> touch with their dev team to try and explain this potential improvement > >>>>>> to > >>>>>> them. I have no objection to contributing this or something similar to > >>>>>> Zookeeper directly. I think I should be able to bring this up in the > >>>>>> Zookeeper dev forum, sometime soon in the next few weekends. > >>>>>> > >>>>>> -Jaikiran > >>>>>> > >>>>>> > >>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote: > >>>>>> > >>>>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but > >>>>>>> not a replacement. Did I get it right? > >>>>>>> > >>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664 > >>>>>>> can also use one. > >>>>>>> > >>>>>>> However, I'm wondering why not contribute the fix directly to ZKClient > >>>>>>> project and ask for a release that contains the fix? > >>>>>>> This will benefit other users of the project who may also need a > >>>>>>> timeout (thats pretty basic...) > >>>>>>> > >>>>>>> As an alternative, if we don't want to collaborate with ZKClient for > >>>>>>> some reason, forking the project into Kafka will probably give us more > >>>>>>> control than wrappers and without much downside. > >>>>>>> > >>>>>>> Just a thought. > >>>>>>> > >>>>>>> Gwen > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai > >>>>>>> <jai.forums2...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded > >>>>>>>> here > >>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem > >>>>>>>> and > >>>>>>>> now > >>>>>>>> the server shuts down even when Zookeeper has gone down before the > >>>>>>>> Kafka > >>>>>>>> server. > >>>>>>>> > >>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient > >>>>>>>> which > >>>>>>>> for now allows time outs to be optionally specified for certain > >>>>>>>> operations. > >>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all > >>>>>>>> over > >>>>>>>> the code and instead for now have just used it in the KafkaServer. > >>>>>>>> > >>>>>>>> Does this patch look like something worth using? > >>>>>>>> > >>>>>>>> -Jaikiran > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote: > >>>>>>>> > >>>>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this > >>>>>>>>> seems > >>>>>>>>> to be patching ZkClient. At some point, if we find ourselves > >>>>>>>>> fiddling > >>>>>>>>> too > >>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little > >>>>>>>>> zookeeper > >>>>>>>>> client wrapper. > >>>>>>>>> > >>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava > >>>>>>>>> <e...@confluent.io> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Looks like a bug to me -- the underlying ZK library wraps a lot > >>>>>>>>> of > >>>>>>>>>> blocking > >>>>>>>>>> method implementations with waitUntilConnected() calls without any > >>>>>>>>>> timeouts. Ideally we could just add a version of > >>>>>>>>>> ZkUtils.getController() > >>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that with > >>>>>>>>>> ZkClient. > >>>>>>>>>> > >>>>>>>>>> There's at least one other call to ZkUtils besides the one in the > >>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more > >>>>>>>>>> that > >>>>>>>>>> aren't directly called in that method. One ugly solution would be > >>>>>>>>>> to > >>>>>>>>>> use > >>>>>>>>>> an > >>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine > >>>>>>>>>> we > >>>>>>>>>> probably have other threads that could end up blocking in similar > >>>>>>>>>> ways. > >>>>>>>>>> > >>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track > >>>>>>>>>> the > >>>>>>>>>> issue. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai < > >>>>>>>>>> jai.forums2...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> The main culprit is this thread which goes into "forever retry > >>>>>>>>>>> connection > >>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) > >>>>>>>>>>> after > >>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete > >>>>>>>>>>> thread > >>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing > >>>>>>>>>>> list. > >>>>>>>>>>> > >>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition > >>>>>>>>>>> [0x6ad69000] > >>>>>>>>>>> java.lang.Thread.State: TIMED_WAITING (parking) > >>>>>>>>>>> at sun.misc.Unsafe.park(Native Method) > >>>>>>>>>>> - parking to wait for <0x70a93368> (a > >>>>>>>>>>> java.util.concurrent.locks. > >>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject) > >>>>>>>>>>> at java.util.concurrent.locks.LockSupport.parkUntil( > >>>>>>>>>>> LockSupport.java:267) > >>>>>>>>>>> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ > >>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130) > >>>>>>>>>>> at > >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636) > >>>>>>>>>>> at > >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619) > >>>>>>>>>>> at > >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615) > >>>>>>>>>>> at > >>>>>>>>>>> > >>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679) > >>>>>>>>>> > >>>>>>>>>>> at > >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766) > >>>>>>>>>>> at > >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) > >>>>>>>>>>> at > >>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456) > >>>>>>>>>>> at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65) > >>>>>>>>>>> at kafka.server.KafkaServer.kafka$server$KafkaServer$$ > >>>>>>>>>>> controlledShutdown(KafkaServer.scala:194) > >>>>>>>>>>> at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$ > >>>>>>>>>>> sp(KafkaServer.scala:269) > >>>>>>>>>>> at kafka.utils.Utils$.swallow(Utils.scala:172) > >>>>>>>>>>> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92) > >>>>>>>>>>> at kafka.utils.Utils$.swallowWarn(Utils.scala:45) > >>>>>>>>>>> at kafka.utils.Logging$class.swallow(Logging.scala:94) > >>>>>>>>>>> at kafka.utils.Utils$.swallow(Utils.scala:45) > >>>>>>>>>>> at > >>>>>>>>>>> kafka.server.KafkaServer.shutdown(KafkaServer.scala:269) > >>>>>>>>>>> at kafka.server.KafkaServerStartable.shutdown( > >>>>>>>>>>> KafkaServerStartable.scala:42) > >>>>>>>>>>> at kafka.Kafka$$anon$1.run(Kafka.scala:42) > >>>>>>>>>>> > >>>>>>>>>>> -Jaikiran > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote: > >>>>>>>>>>> > >>>>>>>>>>> For a clean shutdown, the broker tries to talk to the > >>>>>>>>>>> controller > >>>>>>>>>>> and > >>>>>>>>>>> also > >>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to > >>>>>>>>>>>> reconnect > >>>>>>>>>>>> > >>>>>>>>>>> to > >>>>>>>>>>> zk. It will help to look at the thread dump. > >>>>>>>>>>>> Thanks > >>>>>>>>>>>> Neha > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai < > >>>>>>>>>>>> jai.forums2...@gmail.com > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> I was just playing around with the RC2 of 0.8.2 and noticed > >>>>>>>>>>>> that > >>>>>>>>>>>> if I > >>>>>>>>>>>> > >>>>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all > >>>>>>>>>>>>> since > >>>>>>>>>>>>> it > >>>>>>>>>>>>> goes > >>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had > >>>>>>>>>>>>> to > >>>>>>>>>>>>> kill > >>>>>>>>>>>>> the > >>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there > >>>>>>>>>>>>> too I > >>>>>>>>>>>>> see > >>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can > >>>>>>>>>>>>> come > >>>>>>>>>>>>> up > >>>>>>>>>>>>> with > >>>>>>>>>>>>> a patch? > >>>>>>>>>>>>> > >>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at > >>>>>>>>>>>>> trying > >>>>>>>>>>>>> to > >>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other > >>>>>>>>>>>>> thread > >>>>>>>>>>>>> > >>>>>>>>>>>> which > >>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked > >>>>>>>>>>>>> forever > >>>>>>>>>>>>> for > >>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for > >>>>>>>>>>>>> server > >>>>>>>>>>>>> > >>>>>>>>>>>> null, > >>>>>>>>>>> unexpected error, closing socket connection and attempting > >>>>>>>>>>> reconnect > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn) > >>>>>>>>>>>>> java.net.ConnectException: Connection refused > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native > >>>>>>>>>>>>> Method) > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( > >>>>>>>>>>>>> SocketChannelImpl.java:739) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( > >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361) > >>>>>>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( > >>>>>>>>>>>>> ClientCnxn.java:1081) > >>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to > >>>>>>>>>>>>> server > >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using > >>>>>>>>>>>>> SASL > >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) > >>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for > >>>>>>>>>>>>> server > >>>>>>>>>>>>> > >>>>>>>>>>>> null, > >>>>>>>>>>> unexpected error, closing socket connection and attempting > >>>>>>>>>>> reconnect > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn) > >>>>>>>>>>>>> java.net.ConnectException: Connection refused > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native > >>>>>>>>>>>>> Method) > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( > >>>>>>>>>>>>> SocketChannelImpl.java:739) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( > >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361) > >>>>>>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( > >>>>>>>>>>>>> ClientCnxn.java:1081) > >>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to > >>>>>>>>>>>>> server > >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using > >>>>>>>>>>>>> SASL > >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) > >>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for > >>>>>>>>>>>>> server > >>>>>>>>>>>>> > >>>>>>>>>>>> null, > >>>>>>>>>>> unexpected error, closing socket connection and attempting > >>>>>>>>>>> reconnect > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn) > >>>>>>>>>>>>> java.net.ConnectException: Connection refused > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native > >>>>>>>>>>>>> Method) > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( > >>>>>>>>>>>>> SocketChannelImpl.java:739) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( > >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361) > >>>>>>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( > >>>>>>>>>>>>> ClientCnxn.java:1081) > >>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to > >>>>>>>>>>>>> server > >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using > >>>>>>>>>>>>> SASL > >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn) > >>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for > >>>>>>>>>>>>> server > >>>>>>>>>>>>> > >>>>>>>>>>>> null, > >>>>>>>>>>> unexpected error, closing socket connection and attempting > >>>>>>>>>>> reconnect > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn) > >>>>>>>>>>>>> java.net.ConnectException: Connection refused > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native > >>>>>>>>>>>>> Method) > >>>>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.finishConnect( > >>>>>>>>>>>>> SocketChannelImpl.java:739) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport( > >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361) > >>>>>>>>>>>>> at org.apache.zookeeper.ClientCnxn$SendThread.run( > >>>>>>>>>>>>> ClientCnxn.java:1081) > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Jaikiran > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>> Thanks, > >>>>>>>>>> Ewen > >>>>>>>>>> > >>>>>>>>>> > >>>>> -- > >>>>> -- Guozhang > >>> >