[ https://issues.apache.org/jira/browse/KAFKA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729666#comment-13729666 ]
Jun Rao commented on KAFKA-989: ------------------------------- Hmm, in the shutdown logic of consumer connector, we set zkclient to null the last. So, all fetchers and the leader finder thread should have been stopped when zkclient is null. > Race condition shutting down high-level consumer results in spinning > background thread > -------------------------------------------------------------------------------------- > > Key: KAFKA-989 > URL: https://issues.apache.org/jira/browse/KAFKA-989 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Environment: Ubuntu Linux x64 > Reporter: Phil Hargett > Attachments: KAFKA-989-failed-to-find-leader.patch, > KAFKA-989-failed-to-find-leader-patch2.patch > > > Running an application that uses the Kafka client under load, can often hit > this issue within a few hours. > High-level consumers come and go over this application's lifecycle, but there > are a variety of defenses that ensure each high-level consumer lasts several > seconds before being shutdown. Nevertheless, some race is causing this > background thread to continue long after the ZKClient it is using has been > disconnected. Since the thread was spawned by a consumer that has already > been shutdown, the application has no way to find this thread and stop it. > Reported on the users-kafka mailing list 6/25 as "0.8 throwing exception > 'Failed to find leader' and high-level consumer fails to make progress". > The only remedy is to shutdown the application and restart it. Externally > detecting that this state has occurred is not pleasant: need to grep log for > repeated occurrences of the same exception. > Stack trace: > Failed to find leader for Set([topic6,0]): java.lang.NullPointerException > at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416) > at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413) > at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) > at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413) > at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) > at kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438) > at kafka.utils.ZkUtils$.getAllBrokersInCluster(ZkUtils.scala:75) > at > kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:63) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira