[ https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051512#comment-15051512 ]
ASF GitHub Bot commented on KAFKA-2980: --------------------------------------- GitHub user becketqin reopened a pull request: https://github.com/apache/kafka/pull/660 KAFKA-2980 Fix deadlock when ZookeeperConsumerConnector create messag… You can merge this pull request into a Git repository by running: $ git pull https://github.com/becketqin/kafka KAFKA-2980 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/660.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #660 ---- commit 6ad40206f354512b1f2db1e3784754ea29415ce7 Author: Jiangjie Qin <becket....@gmail.com> Date: 2015-12-10T19:08:15Z KAKFA-2980 Fix deadlock when ZookeeperConsumerConnector create message streams. ---- > ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a > stream creation. > --------------------------------------------------------------------------------------------- > > Key: KAFKA-2980 > URL: https://issues.apache.org/jira/browse/KAFKA-2980 > Project: Kafka > Issue Type: Bug > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > > The following sequence caused problems: > 1. Multiple ZookeeperConsumerConnector in the same group start at the same > time. > 2. The user consumer thread called createMessageStreamsByFilter() > 3. Right before the user consumer thread enters syncedRebalance(), a > rebalance was triggered by another consumer joining the group. > 4. Because the watcher executor has been up and running at this point, the > executor watcher will start to rebalance. Now both the user consumer thread > and the executor watcher are trying to rebalance. > 5. The executor watcher wins this time. It finishes the rebalance, so the > fetchers started to run. > 6. After that the user consumer thread will try to rebalance again, but it > blocks when trying to stop the fetchers. Since the fetcher threads are > blocked on putting data chunk into data chunk queue. > 7. In this case, because there is no thread taking messages out of data chunk > queue, the fetcher thread will not be able to make process. Neither does the > user consumer thread. So we have a deadlock here. > The current code works if there is no fetcher thread running when > createMessageStreams/createMessageStreamsByFilter is called. The simple fix > is to let those two methods acquire the rebalance lock. > Although it is a fix to old consumer, but since the fix is quite small and > important for people who are still using old consumer. I think it still worth > doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)