Any reason not to go with apache curator http://curator.apache.org/ .
-Harsha
On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
> I am also +1 on Neha's suggestion that "At some point, if we find
> ourselves
> fiddling too much with ZkClient, it wouldn't hurt to write our own little
> zookeeper client wrapper." since we have accumulated a bunch of issues
> with
> zkClient which takes long time be resolved if ever, so we ended up have
> some hacky way handling zkClient errors.
> 
> Guozhang
> 
> On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <jai.forums2...@gmail.com>
> wrote:
> 
> > Yes, that's the plan :)
> >
> > -Jaikiran
> >
> > On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> >
> >> So I think the current plan is:
> >> 1. Add timeout in zkclient
> >> 2. Ask zkclient to release new version (we need it for few other things
> >> too)
> >> 3. Rebase on new zkclient
> >> 4. Fix this jira and the few others than were waiting for the new zkclient
> >>
> >> Does that make sense?
> >>
> >> Gwen
> >>
> >> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <jai.forums2...@gmail.com>
> >> wrote:
> >>
> >>> I just heard back from Stefan, who manages the ZkClient repo and he
> >>> seems to
> >>> be open to have these changes be part of ZkClient project. I'll be
> >>> creating
> >>> a pull request for that project to have it reviewed and merged. Although
> >>> I
> >>> haven't heard of exact release plans, Stefan's reply did indicate that
> >>> the
> >>> project could be released after this change is merged.
> >>>
> >>> -Jaikiran
> >>>
> >>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> >>>
> >>>> Thanks for pointing to that repo!
> >>>>
> >>>> I just had a look at it and it appears that the project isn't much
> >>>> active
> >>>> (going by the lack of activity). The latest contribution is from Gwen
> >>>> and
> >>>> that was around 3 months back. I haven't found release plans for that
> >>>> project or a place to ask about it (filing an issue doesn't seem right
> >>>> to
> >>>> ask this question). So I'll get in touch with the repo owner and see
> >>>> what
> >>>> his plans for the project are.
> >>>>
> >>>> -Jaikiran
> >>>>
> >>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> >>>>
> >>>>> I did!
> >>>>>
> >>>>> Thanks for clarifying :)
> >>>>>
> >>>>> The client that is part of Zookeeper itself actually does support
> >>>>> timeouts.
> >>>>>
> >>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wangg...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Jaikiran,
> >>>>>>
> >>>>>> I think Gwen was talking about contributing to ZkClient project:
> >>>>>>
> >>>>>> https://github.com/sgroschupf/zkclient
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
> >>>>>> jai.forums2...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  Hi Gwen,
> >>>>>>>
> >>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a
> >>>>>>> complete
> >>>>>>> replacement.
> >>>>>>>
> >>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but
> >>>>>>> I
> >>>>>>> haven't yet had a chance to really look deeper into Zookeeper or get
> >>>>>>> in
> >>>>>>> touch with their dev team to try and explain this potential
> >>>>>>> improvement
> >>>>>>> to
> >>>>>>> them. I have no objection to contributing this or something similar
> >>>>>>> to
> >>>>>>> Zookeeper directly. I think I should be able to bring this up in the
> >>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
> >>>>>>>
> >>>>>>> -Jaikiran
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> >>>>>>>
> >>>>>>>  It looks like the new KafkaZkClient is a wrapper around ZkClient,
> >>>>>>>> but
> >>>>>>>> not a replacement. Did I get it right?
> >>>>>>>>
> >>>>>>>> I think a wrapper for ZkClient can be useful - for example
> >>>>>>>> KAFKA-1664
> >>>>>>>> can also use one.
> >>>>>>>>
> >>>>>>>> However, I'm wondering why not contribute the fix directly to
> >>>>>>>> ZKClient
> >>>>>>>> project and ask for a release that contains the fix?
> >>>>>>>> This will benefit other users of the project who may also need a
> >>>>>>>> timeout (thats pretty basic...)
> >>>>>>>>
> >>>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
> >>>>>>>> some reason, forking the project into Kafka will probably give us
> >>>>>>>> more
> >>>>>>>> control than wrappers and without much downside.
> >>>>>>>>
> >>>>>>>> Just a thought.
> >>>>>>>>
> >>>>>>>> Gwen
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> >>>>>>>> <jai.forums2...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
> >>>>>>>>> uploaded
> >>>>>>>>> here
> >>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown
> >>>>>>>>> problem
> >>>>>>>>> and
> >>>>>>>>> now
> >>>>>>>>> the server shuts down even when Zookeeper has gone down before the
> >>>>>>>>> Kafka
> >>>>>>>>> server.
> >>>>>>>>>
> >>>>>>>>> I went with the approach of introducing a custom (enhanced)
> >>>>>>>>> ZkClient
> >>>>>>>>> which
> >>>>>>>>> for now allows time outs to be optionally specified for certain
> >>>>>>>>> operations.
> >>>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient
> >>>>>>>>> all
> >>>>>>>>> over
> >>>>>>>>> the code and instead for now have just used it in the KafkaServer.
> >>>>>>>>>
> >>>>>>>>> Does this patch look like something worth using?
> >>>>>>>>>
> >>>>>>>>> -Jaikiran
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> >>>>>>>>>
> >>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix for
> >>>>>>>>>> this
> >>>>>>>>>> seems
> >>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
> >>>>>>>>>> fiddling
> >>>>>>>>>> too
> >>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
> >>>>>>>>>> zookeeper
> >>>>>>>>>> client wrapper.
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> >>>>>>>>>> <e...@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a
> >>>>>>>>>> lot of
> >>>>>>>>>>
> >>>>>>>>>>> blocking
> >>>>>>>>>>> method implementations with waitUntilConnected() calls without
> >>>>>>>>>>> any
> >>>>>>>>>>> timeouts. Ideally we could just add a version of
> >>>>>>>>>>> ZkUtils.getController()
> >>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that
> >>>>>>>>>>> with
> >>>>>>>>>>> ZkClient.
> >>>>>>>>>>>
> >>>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
> >>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly
> >>>>>>>>>>> more
> >>>>>>>>>>> that
> >>>>>>>>>>> aren't directly called in that method. One ugly solution would be
> >>>>>>>>>>> to
> >>>>>>>>>>> use
> >>>>>>>>>>> an
> >>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
> >>>>>>>>>>> we
> >>>>>>>>>>> probably have other threads that could end up blocking in similar
> >>>>>>>>>>> ways.
> >>>>>>>>>>>
> >>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to
> >>>>>>>>>>> track
> >>>>>>>>>>> the
> >>>>>>>>>>> issue.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> >>>>>>>>>>> jai.forums2...@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>    The main culprit is this thread which goes into "forever retry
> >>>>>>>>>>>
> >>>>>>>>>>>> connection
> >>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
> >>>>>>>>>>>> after
> >>>>>>>>>>>> zookeeper has already been shutdown. I have attached the
> >>>>>>>>>>>> complete
> >>>>>>>>>>>> thread
> >>>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
> >>>>>>>>>>>> list.
> >>>>>>>>>>>>
> >>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on
> >>>>>>>>>>>> condition
> >>>>>>>>>>>> [0x6ad69000]
> >>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> >>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> >>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> >>>>>>>>>>>> java.util.concurrent.locks.
> >>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> >>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
> >>>>>>>>>>>> LockSupport.java:267)
> >>>>>>>>>>>>         at java.util.concurrent.locks.
> >>>>>>>>>>>> AbstractQueuedSynchronizer$
> >>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
> >>>>>>>>>>>> java:2130)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> >>>>>>>>>>>> java:636)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> >>>>>>>>>>>> java:619)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> >>>>>>>>>>>> java:615)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>>
> >>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> >>>>>>>>>>> java:679)
> >>>>>>>>>>>
> >>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
> >>>>>>>>>>>> readData(ZkClient.java:766)
> >>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
> >>>>>>>>>>>> readData(ZkClient.java:761)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> >>>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> >>>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
> >>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> >>>>>>>>>>>>         at kafka.server.KafkaServer$$
> >>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
> >>>>>>>>>>>> sp(KafkaServer.scala:269)
> >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> >>>>>>>>>>>>         at kafka.utils.Logging$class.
> >>>>>>>>>>>> swallowWarn(Logging.scala:92)
> >>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> >>>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
> >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> >>>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:
> >>>>>>>>>>>> 269)
> >>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> >>>>>>>>>>>> KafkaServerStartable.scala:42)
> >>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
> >>>>>>>>>>>> controller
> >>>>>>>>>>>> and
> >>>>>>>>>>>> also
> >>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
> >>>>>>>>>>>>
> >>>>>>>>>>>>> reconnect
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  to
> >>>>>>>>>>>> zk. It will help to look at the thread dump.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>> Neha
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> >>>>>>>>>>>>> jai.forums2...@gmail.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and
> >>>>>>>>>>>>> noticed
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>> if I
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server at all
> >>>>>>>>>>>>>> since
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>> goes
> >>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> kill
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and
> >>>>>>>>>>>>>> there
> >>>>>>>>>>>>>> too I
> >>>>>>>>>>>>>> see
> >>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
> >>>>>>>>>>>>>> come
> >>>>>>>>>>>>>> up
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>> a patch?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
> >>>>>>>>>>>>>> trying
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
> >>>>>>>>>>>>>> thread
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  which
> >>>>>>>>>>>>>
> >>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
> >>>>>>>>>>>>
> >>>>>>>>>>>>> forever
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>> SASL
> >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>> SASL
> >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>> SASL
> >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    --
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>> Ewen
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>>
> >>>>
> >
> 
> 
> -- 
> -- Guozhang

Reply via email to