[ https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051511#comment-15051511 ]
ASF GitHub Bot commented on KAFKA-2980: --------------------------------------- Github user becketqin closed the pull request at: https://github.com/apache/kafka/pull/660 > ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a > stream creation. > --------------------------------------------------------------------------------------------- > > Key: KAFKA-2980 > URL: https://issues.apache.org/jira/browse/KAFKA-2980 > Project: Kafka > Issue Type: Bug > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > > The following sequence caused problems: > 1. Multiple ZookeeperConsumerConnector in the same group start at the same > time. > 2. The user consumer thread called createMessageStreamsByFilter() > 3. Right before the user consumer thread enters syncedRebalance(), a > rebalance was triggered by another consumer joining the group. > 4. Because the watcher executor has been up and running at this point, the > executor watcher will start to rebalance. Now both the user consumer thread > and the executor watcher are trying to rebalance. > 5. The executor watcher wins this time. It finishes the rebalance, so the > fetchers started to run. > 6. After that the user consumer thread will try to rebalance again, but it > blocks when trying to stop the fetchers. Since the fetcher threads are > blocked on putting data chunk into data chunk queue. > 7. In this case, because there is no thread taking messages out of data chunk > queue, the fetcher thread will not be able to make process. Neither does the > user consumer thread. So we have a deadlock here. > The current code works if there is no fetcher thread running when > createMessageStreams/createMessageStreamsByFilter is called. The simple fix > is to let those two methods acquire the rebalance lock. > Although it is a fix to old consumer, but since the fix is quite small and > important for people who are still using old consumer. I think it still worth > doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)