[ 
https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051511#comment-15051511
 ] 

ASF GitHub Bot commented on KAFKA-2980:
---------------------------------------

Github user becketqin closed the pull request at:

    https://github.com/apache/kafka/pull/660


> ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a 
> stream creation.
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2980
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2980
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>
> The following sequence caused problems:
> 1. Multiple ZookeeperConsumerConnector in the same group start at the same 
> time.
> 2. The user consumer thread called createMessageStreamsByFilter()
> 3. Right before the user consumer thread enters syncedRebalance(), a 
> rebalance was triggered by another consumer joining the group.
> 4. Because the watcher executor has been up and running at this point, the 
> executor watcher will start to rebalance. Now both the user consumer thread 
> and the executor watcher are trying to rebalance.
> 5. The executor watcher wins this time. It finishes the rebalance, so the 
> fetchers started to run.
> 6. After that the user consumer thread will try to rebalance again, but it 
> blocks when trying to stop the fetchers. Since the fetcher threads are 
> blocked on putting data chunk into data chunk queue.
> 7. In this case, because there is no thread taking messages out of data chunk 
> queue, the fetcher thread will not be able to make process. Neither does the 
> user consumer thread. So we have a deadlock here.
> The current code works if there is no fetcher thread running when 
> createMessageStreams/createMessageStreamsByFilter is called. The simple fix 
> is to let those two methods acquire the rebalance lock.
> Although it is a fix to old consumer, but since the fix is quite small and 
> important for people who are still using old consumer. I think it still worth 
> doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to