[
https://issues.apache.org/jira/browse/KAFKA-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200088#comment-16200088
]
ASF GitHub Bot commented on KAFKA-6051:
---------------------------------------
GitHub user mayt opened a pull request:
https://github.com/apache/kafka/pull/4056
KAFKA-6051 Close the ReplicaFetcherBlockingSend earlier on shutdown
Rearranged the testAddPartitionDuringDeleteTopic() test to keep the
likelyhood of the race condition.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mayt/kafka KAFKA-6051
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/4056.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4056
----
commit 36c1fa6ca3bab4dc070910cba9223f4141982d82
Author: Maytee Chinavanichkit <[email protected]>
Date: 2017-10-11T10:35:54Z
KAFKA-6051 Close the ReplicaFetcherBlockingSend earlier on shutdown
Rearranged the testAddPartitionDuringDeleteTopic() test to keep the
likelyhood of the race condition.
----
> ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on
> shutdown
> ------------------------------------------------------------------------------------
>
> Key: KAFKA-6051
> URL: https://issues.apache.org/jira/browse/KAFKA-6051
> Project: Kafka
> Issue Type: Bug
> Reporter: Maytee Chinavanichkit
>
> The ReplicaFetcherBlockingSend works as designed and will blocks until it is
> able to get data. This becomes a problem when we are gracefully shutting down
> a broker. The controller will attempt to shutdown the fetchers and elect new
> leaders. When the last fetch of partition is removed, as part of the
> {replicaManager.becomeLeaderOrFollower} call will proceed to shut down any
> idle ReplicaFetcherThread. The shutdown process here can block up to until
> the last fetch request completes. This blocking delay is a big problem
> because the {replicaStateChangeLock}, and {mapLock} in
> {AbstractFetcherManager} is still locked causing latency spikes on multiple
> brokers.
> At this point in time, we do not need the last response as the fetcher is
> shutting down. We should close the leaderEndpoint early during
> {initiateShutdown()} instead of after {super.shutdown()}.
> For example we see here the shutdown blocked the broker from processing more
> replica changes for ~500 ms
> {code}
> [2017-09-01 18:11:42,879] INFO [ReplicaFetcherThread-0-2], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Stopped
> (kafka.server.ReplicaFetcherThread)
> [2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Shutdown completed
> (kafka.server.ReplicaFetcherThread)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)