Ewen Cheslack-Postava created KAFKA-1907:
--------------------------------------------

             Summary: ZkClient can block controlled shutdown indefinitely
                 Key: KAFKA-1907
                 URL: https://issues.apache.org/jira/browse/KAFKA-1907
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.8.2
            Reporter: Ewen Cheslack-Postava


There are some calls to ZkClient via ZkUtils in 
KafkaServer.controlledShutdown() that can block indefinitely because they 
internally call waitUntilConnected. The ZkClient API doesn't provide an 
alternative with timeouts, so fixing this will require enforcing timeouts in 
some other way.

This may be a more general issue if there are any non daemon threads that also 
call ZkUtils methods.

Stacktrace showing the issue:

{code}
"Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition [0x6ad69000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x70a93368> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:267)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
    at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
    at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
    at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
    at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
    at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
    at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
    at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
    at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
    at 
kafka.server.KafkaServer.kafka$server$KafkaServer$$controlledShutdown(KafkaServer.scala:194)
    at 
kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$sp(KafkaServer.scala:269)
    at kafka.utils.Utils$.swallow(Utils.scala:172)
    at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
    at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
    at kafka.utils.Logging$class.swallow(Logging.scala:94)
    at kafka.utils.Utils$.swallow(Utils.scala:45)
    at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
    at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42)
    at kafka.Kafka$$anon$1.run(Kafka.scala:42)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to