Hi Jaikiran,

I think Gwen was talking about contributing to ZkClient project:

https://github.com/sgroschupf/zkclient

Guozhang


On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2...@gmail.com>
wrote:

> Hi Gwen,
>
> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
> replacement.
>
> As for contributing to Zookeeper, yes that indeed in on my mind, but I
> haven't yet had a chance to really look deeper into Zookeeper or get in
> touch with their dev team to try and explain this potential improvement to
> them. I have no objection to contributing this or something similar to
> Zookeeper directly. I think I should be able to bring this up in the
> Zookeeper dev forum, sometime soon in the next few weekends.
>
> -Jaikiran
>
>
> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>
>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>> not a replacement. Did I get it right?
>>
>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>> can also use one.
>>
>> However, I'm wondering why not contribute the fix directly to ZKClient
>> project and ask for a release that contains the fix?
>> This will benefit other users of the project who may also need a
>> timeout (thats pretty basic...)
>>
>> As an alternative, if we don't want to collaborate with ZKClient for
>> some reason, forking the project into Kafka will probably give us more
>> control than wrappers and without much downside.
>>
>> Just a thought.
>>
>> Gwen
>>
>>
>>
>>
>>
>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <jai.forums2...@gmail.com>
>> wrote:
>>
>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>> here
>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and
>>> now
>>> the server shuts down even when Zookeeper has gone down before the Kafka
>>> server.
>>>
>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>> which
>>> for now allows time outs to be optionally specified for certain
>>> operations.
>>> I intentionally haven't forced the use of this new KafkaZkClient all over
>>> the code and instead for now have just used it in the KafkaServer.
>>>
>>> Does this patch look like something worth using?
>>>
>>> -Jaikiran
>>>
>>>
>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>
>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>> seems
>>>> to be patching ZkClient. At some point, if we find ourselves fiddling
>>>> too
>>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>>>> client wrapper.
>>>>
>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>> <e...@confluent.io>
>>>> wrote:
>>>>
>>>>  Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>> blocking
>>>>> method implementations with waitUntilConnected() calls without any
>>>>> timeouts. Ideally we could just add a version of
>>>>> ZkUtils.getController()
>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>> ZkClient.
>>>>>
>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>> stacktrace you gave that would cause the same issue, possibly more that
>>>>> aren't directly called in that method. One ugly solution would be to
>>>>> use
>>>>> an
>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>>>> probably have other threads that could end up blocking in similar ways.
>>>>>
>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>>>> issue.
>>>>>
>>>>>
>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>> jai.forums2...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  The main culprit is this thread which goes into "forever retry
>>>>>> connection
>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>> thread
>>>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>>>
>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>> [0x6ad69000]
>>>>>>      java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>>>       - parking to wait for  <0x70a93368> (a
>>>>>> java.util.concurrent.locks.
>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>       at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>> LockSupport.java:267)
>>>>>>       at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>       at
>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>       at
>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>       at
>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>       at
>>>>>>
>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>
>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>       at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>       at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>       at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>       at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>> sp(KafkaServer.scala:269)
>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>       at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>       at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>       at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>       at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>       at kafka.server.KafkaServerStartable.shutdown(
>>>>>> KafkaServerStartable.scala:42)
>>>>>>       at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>
>>>>>>  For a clean shutdown, the broker tries to talk to the controller and
>>>>>>>
>>>>>> also
>>>>>
>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>> reconnect
>>>>>>>
>>>>>> to
>>>>>
>>>>>> zk. It will help to look at the thread dump.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Neha
>>>>>>>
>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>> jai.forums2...@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>    I was just playing around with the RC2 of 0.8.2 and noticed that
>>>>>>> if I
>>>>>>>
>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since
>>>>>>>> it
>>>>>>>> goes
>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to
>>>>>>>> kill
>>>>>>>> the
>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>> too I
>>>>>>>> see
>>>>>>>> the same issue. Should I file a JIRA for this and see if I can come
>>>>>>>> up
>>>>>>>> with
>>>>>>>> a patch?
>>>>>>>>
>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying
>>>>>>>> to
>>>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>>>>>>
>>>>>>> which
>>>>>
>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>> forever
>>>>>>>> for
>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>
>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>> SASL
>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>> SASL
>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>> SASL
>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>> Thanks,
>>>>> Ewen
>>>>>
>>>>>
>>>>
>


-- 
-- Guozhang

Reply via email to