While on the subject of zkclient, also consider KAFKA-1793.  A more
abstract interface to the distributed coordination service that could be
configured to use alternatives like consul or etcd would be very useful
imho.

Dana
FWIW - the ZkClient project team have merged the pull request that I had
submitted to allow for timeouts to operations https://github.com/sgroschupf/
zkclient/pull/29. I heard from Johannes (from the ZkClient project team)
that they don't have any specific release date in mind but are willing to
release a new version if/when we need one.

-Jaikiran

On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:

> So I think the current plan is:
> 1. Add timeout in zkclient
> 2. Ask zkclient to release new version (we need it for few other things
> too)
> 3. Rebase on new zkclient
> 4. Fix this jira and the few others than were waiting for the new zkclient
>
> Does that make sense?
>
> Gwen
>
> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <jai.forums2...@gmail.com>
> wrote:
>
>> I just heard back from Stefan, who manages the ZkClient repo and he seems
>> to
>> be open to have these changes be part of ZkClient project. I'll be
>> creating
>> a pull request for that project to have it reviewed and merged. Although I
>> haven't heard of exact release plans, Stefan's reply did indicate that the
>> project could be released after this change is merged.
>>
>> -Jaikiran
>>
>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>
>>> Thanks for pointing to that repo!
>>>
>>> I just had a look at it and it appears that the project isn't much active
>>> (going by the lack of activity). The latest contribution is from Gwen and
>>> that was around 3 months back. I haven't found release plans for that
>>> project or a place to ask about it (filing an issue doesn't seem right to
>>> ask this question). So I'll get in touch with the repo owner and see what
>>> his plans for the project are.
>>>
>>> -Jaikiran
>>>
>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>
>>>> I did!
>>>>
>>>> Thanks for clarifying :)
>>>>
>>>> The client that is part of Zookeeper itself actually does support
>>>> timeouts.
>>>>
>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wangg...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jaikiran,
>>>>>
>>>>> I think Gwen was talking about contributing to ZkClient project:
>>>>>
>>>>> https://github.com/sgroschupf/zkclient
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>>  Hi Gwen,
>>>>>>
>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>>>>> replacement.
>>>>>>
>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>>>>> haven't yet had a chance to really look deeper into Zookeeper or get
>>>>>> in
>>>>>> touch with their dev team to try and explain this potential
>>>>>> improvement
>>>>>> to
>>>>>> them. I have no objection to contributing this or something similar to
>>>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>>
>>>>>>  It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>>>>> not a replacement. Did I get it right?
>>>>>>>
>>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>>>>> can also use one.
>>>>>>>
>>>>>>> However, I'm wondering why not contribute the fix directly to
>>>>>>> ZKClient
>>>>>>> project and ask for a release that contains the fix?
>>>>>>> This will benefit other users of the project who may also need a
>>>>>>> timeout (thats pretty basic...)
>>>>>>>
>>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>>>> some reason, forking the project into Kafka will probably give us
>>>>>>> more
>>>>>>> control than wrappers and without much downside.
>>>>>>>
>>>>>>> Just a thought.
>>>>>>>
>>>>>>> Gwen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>>> <jai.forums2...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
>>>>>>>> uploaded
>>>>>>>> here
>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem
>>>>>>>> and
>>>>>>>> now
>>>>>>>> the server shuts down even when Zookeeper has gone down before the
>>>>>>>> Kafka
>>>>>>>> server.
>>>>>>>>
>>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>>>>> which
>>>>>>>> for now allows time outs to be optionally specified for certain
>>>>>>>> operations.
>>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all
>>>>>>>> over
>>>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>>>
>>>>>>>> Does this patch look like something worth using?
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>>>
>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix for
>>>>>>>>> this
>>>>>>>>> seems
>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
>>>>>>>>> fiddling
>>>>>>>>> too
>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
>>>>>>>>> zookeeper
>>>>>>>>> client wrapper.
>>>>>>>>>
>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>>> <e...@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a lot
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>>> blocking
>>>>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>>> ZkUtils.getController()
>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that
>>>>>>>>>> with
>>>>>>>>>> ZkClient.
>>>>>>>>>>
>>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more
>>>>>>>>>> that
>>>>>>>>>> aren't directly called in that method. One ugly solution would be
>>>>>>>>>> to
>>>>>>>>>> use
>>>>>>>>>> an
>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
>>>>>>>>>> we
>>>>>>>>>> probably have other threads that could end up blocking in similar
>>>>>>>>>> ways.
>>>>>>>>>>
>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
>>>>>>>>>> the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>>> jai.forums2...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>    The main culprit is this thread which goes into "forever retry
>>>>>>>>>>
>>>>>>>>>>> connection
>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
>>>>>>>>>>> after
>>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>>>>> thread
>>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
>>>>>>>>>>> list.
>>>>>>>>>>>
>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
>>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>>         at java.util.concurrent.locks.
>>>>>>>>>>> AbstractQueuedSynchronizer$
>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
>>>>>>>>>>> java:636)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>> java:619)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>> java:615)
>>>>>>>>>>>         at
>>>>>>>>>>>
>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
>>>>>>>>>> java:679)
>>>>>>>>>>
>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>> readData(ZkClient.java:766)
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>> readData(ZkClient.java:761)
>>>>>>>>>>>         at
>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>>         at kafka.server.KafkaServer$$
>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
>>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>>         at kafka.utils.Logging$class.
>>>>>>>>>>> swallowWarn(Logging.scala:92)
>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:
>>>>>>>>>>> 269)
>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>>
>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>>>
>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
>>>>>>>>>>> controller
>>>>>>>>>>> and
>>>>>>>>>>> also
>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>>>
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>>  to
>>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Neha
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>>> jai.forums2...@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and noticed
>>>>>>>>>>>> that
>>>>>>>>>>>> if I
>>>>>>>>>>>>
>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server at all
>>>>>>>>>>>>> since
>>>>>>>>>>>>> it
>>>>>>>>>>>>> goes
>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
>>>>>>>>>>>>> to
>>>>>>>>>>>>> kill
>>>>>>>>>>>>> the
>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and
>>>>>>>>>>>>> there
>>>>>>>>>>>>> too I
>>>>>>>>>>>>> see
>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
>>>>>>>>>>>>> come
>>>>>>>>>>>>> up
>>>>>>>>>>>>> with
>>>>>>>>>>>>> a patch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
>>>>>>>>>>>>> trying
>>>>>>>>>>>>> to
>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
>>>>>>>>>>>>> thread
>>>>>>>>>>>>>
>>>>>>>>>>>>>  which
>>>>>>>>>>>>
>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>>
>>>>>>>>>>>> forever
>>>>>>>>>>>>> for
>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>> using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>> using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>> using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    --
>>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>> Ewen
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>> -- Guozhang
>>>>>
>>>>
>>>

Reply via email to