[ 
https://issues.apache.org/jira/browse/KAFKA-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051512#comment-15051512
 ] 

ASF GitHub Bot commented on KAFKA-2980:
---------------------------------------

GitHub user becketqin reopened a pull request:

    https://github.com/apache/kafka/pull/660

    KAFKA-2980 Fix deadlock when ZookeeperConsumerConnector create messag…

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/becketqin/kafka KAFKA-2980

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/660.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #660
    
----
commit 6ad40206f354512b1f2db1e3784754ea29415ce7
Author: Jiangjie Qin <becket....@gmail.com>
Date:   2015-12-10T19:08:15Z

    KAKFA-2980 Fix deadlock when ZookeeperConsumerConnector create message 
streams.

----


> ZookeeperConsumerConnector may enter deadlock if a rebalance occurs during a 
> stream creation.
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2980
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2980
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>
> The following sequence caused problems:
> 1. Multiple ZookeeperConsumerConnector in the same group start at the same 
> time.
> 2. The user consumer thread called createMessageStreamsByFilter()
> 3. Right before the user consumer thread enters syncedRebalance(), a 
> rebalance was triggered by another consumer joining the group.
> 4. Because the watcher executor has been up and running at this point, the 
> executor watcher will start to rebalance. Now both the user consumer thread 
> and the executor watcher are trying to rebalance.
> 5. The executor watcher wins this time. It finishes the rebalance, so the 
> fetchers started to run.
> 6. After that the user consumer thread will try to rebalance again, but it 
> blocks when trying to stop the fetchers. Since the fetcher threads are 
> blocked on putting data chunk into data chunk queue.
> 7. In this case, because there is no thread taking messages out of data chunk 
> queue, the fetcher thread will not be able to make process. Neither does the 
> user consumer thread. So we have a deadlock here.
> The current code works if there is no fetcher thread running when 
> createMessageStreams/createMessageStreamsByFilter is called. The simple fix 
> is to let those two methods acquire the rebalance lock.
> Although it is a fix to old consumer, but since the fix is quite small and 
> important for people who are still using old consumer. I think it still worth 
> doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to