Hi Gwen,
Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
replacement.
As for contributing to Zookeeper, yes that indeed in on my mind, but I
haven't yet had a chance to really look deeper into Zookeeper or get in
touch with their dev team to try and explain this potential improvement
to them. I have no objection to contributing this or something similar
to Zookeeper directly. I think I should be able to bring this up in the
Zookeeper dev forum, sometime soon in the next few weekends.
-Jaikiran
On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
It looks like the new KafkaZkClient is a wrapper around ZkClient, but
not a replacement. Did I get it right?
I think a wrapper for ZkClient can be useful - for example KAFKA-1664
can also use one.
However, I'm wondering why not contribute the fix directly to ZKClient
project and ask for a release that contains the fix?
This will benefit other users of the project who may also need a
timeout (thats pretty basic...)
As an alternative, if we don't want to collaborate with ZKClient for
some reason, forking the project into Kafka will probably give us more
control than wrappers and without much downside.
Just a thought.
Gwen
On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <jai.forums2...@gmail.com> wrote:
Neha, Ewen (and others), my initial attempt to solve this is uploaded here
https://reviews.apache.org/r/30477/. It solves the shutdown problem and now
the server shuts down even when Zookeeper has gone down before the Kafka
server.
I went with the approach of introducing a custom (enhanced) ZkClient which
for now allows time outs to be optionally specified for certain operations.
I intentionally haven't forced the use of this new KafkaZkClient all over
the code and instead for now have just used it in the KafkaServer.
Does this patch look like something worth using?
-Jaikiran
On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
Ewen is right. ZkClient APIs are blocking and the right fix for this seems
to be patching ZkClient. At some point, if we find ourselves fiddling too
much with ZkClient, it wouldn't hurt to write our own little zookeeper
client wrapper.
On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
<e...@confluent.io>
wrote:
Looks like a bug to me -- the underlying ZK library wraps a lot of
blocking
method implementations with waitUntilConnected() calls without any
timeouts. Ideally we could just add a version of ZkUtils.getController()
with a timeout, but I don't see an easy way to accomplish that with
ZkClient.
There's at least one other call to ZkUtils besides the one in the
stacktrace you gave that would cause the same issue, possibly more that
aren't directly called in that method. One ugly solution would be to use
an
extra thread during shutdown to trigger timeouts, but I'd imagine we
probably have other threads that could end up blocking in similar ways.
I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
issue.
On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <jai.forums2...@gmail.com>
wrote:
The main culprit is this thread which goes into "forever retry
connection
to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
zookeeper has already been shutdown. I have attached the complete thread
dump, but I don't know if it will be delivered to the mailing list.
"Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
[0x6ad69000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x70a93368> (a java.util.concurrent.locks.
AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkUntil(
LockSupport.java:267)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$
ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
at
org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
at
org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
at
org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
at
org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
at kafka.server.KafkaServer.kafka$server$KafkaServer$$
controlledShutdown(KafkaServer.scala:194)
at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
sp(KafkaServer.scala:269)
at kafka.utils.Utils$.swallow(Utils.scala:172)
at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
at kafka.utils.Logging$class.swallow(Logging.scala:94)
at kafka.utils.Utils$.swallow(Utils.scala:45)
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
at kafka.server.KafkaServerStartable.shutdown(
KafkaServerStartable.scala:42)
at kafka.Kafka$$anon$1.run(Kafka.scala:42)
-Jaikiran
On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
For a clean shutdown, the broker tries to talk to the controller and
also
issues reads to zookeeper. Possibly that is where it tries to reconnect
to
zk. It will help to look at the thread dump.
Thanks
Neha
On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <jai.forums2...@gmail.com
wrote:
I was just playing around with the RC2 of 0.8.2 and noticed that if I
shutdown zookeeper first I can't shutdown Kafka server at all since it
goes
into a never ending attempt to reconnect with zookeeper. I had to kill
the
Kafka process to stop it. I tried it against trunk too and there too I
see
the same issue. Should I file a JIRA for this and see if I can come up
with
a patch?
FWIW, here's the unending (and IMO too frequent) attempts at trying to
reconnect. I've a thread dump too which shows that the other thread
which
is trying to complete a controlled shutdown of Kafka is blocked
forever
for
the zookeeper to be up. I can attach it to the JIRA.
2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
null,
unexpected error, closing socket connection and attempting reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)
[2015-01-24 10:15:47,437] INFO Opening socket connection to server
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
null,
unexpected error, closing socket connection and attempting reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)
[2015-01-24 10:15:49,056] INFO Opening socket connection to server
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
null,
unexpected error, closing socket connection and attempting reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)
[2015-01-24 10:15:50,801] INFO Opening socket connection to server
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
null,
unexpected error, closing socket connection and attempting reconnect
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(
SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(
ClientCnxn.java:1081)
-Jaikiran
--
Thanks,
Ewen