[ 
https://issues.apache.org/jira/browse/KAFKA-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-937:
--------------------------

    Attachment: kafka-937.patch

Attach a patch. The fix is to make sure that the fetcher thread never gets 
blocked, no matter what other threads like the LeaderFindThread does. 
Specifically, LeaderFinderThread no longer holds lock when calling addFetcher() 
or shudownIdleFetcherThreads(). This way 
ConsumerFetcherManager.addPartitionsWithError() never gets blocked, which in 
turn means that the ConsumerFetcherThread never gets blocked and can complete 
the shutdown if required.

Double-checked other paths and don't see any other potential deadlocks.

Also fixed another potential socket leak through SimpleConsumer. When we 
shutdown a fetcher, we first interrupt the fetcher thread and close the 
SimpleConsumer. However, after that, it is possible for the fetcher thread to 
make another fetch request on SimpleConsumer. This will establish the socket 
connection again. Add a fix in SimpleConsumer so that after it is closed, no 
new socket connections will be established and the fetch call will get a 
ClosedChannelException instead.
                
> ConsumerFetcherThread can deadlock
> ----------------------------------
>
>                 Key: KAFKA-937
>                 URL: https://issues.apache.org/jira/browse/KAFKA-937
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: kafka-937.patch
>
>
> We have the following access pattern that can introduce a deadlock.
> AbstractFetcherThread.processPartitionsWithError() ->
> ConsumerFetcherThread.processPartitionsWithError() -> 
> ConsumerFetcherManager.addPartitionsWithError() wait for lock ->
> LeaderFinderThread holding lock while calling 
> AbstractFetcherManager.shutdownIdleFetcherThreads() ->
> AbstractFetcherManager calling fetcher.shutdown, which needs to wait until 
> AbstractFetcherThread.processPartitionsWithError() completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to