[ https://issues.apache.org/jira/browse/KAFKA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199314#comment-16199314 ]
Jun Rao commented on KAFKA-6029: -------------------------------- One of the issues that can lead to this is that the follower and the leader may receive LeaderAndIsrRequests at different times. So, if the leader receives a LeaderAndIsrRequest with reduced ISR due to controlled shutdown of the follower, but the follower continues to fetch (since it hasn't received the LeaderAndIsrRequest yet), the follower will by added back to ISR. Then, when the follower shuts down, we have to wait for replica.lag.time.max.ms for the follower to be dropped out ISR. Onur and I discussed this a bit. One way to improve this is for the LeaderAndIsrRequest to indicate that a replica is about to go down such that the leader doesn't add it back to ISR. That indication could be piggy-backed on a broker epoch, which is needed in https://issues.apache.org/jira/browse/KAFKA-1120. > Controller should wait for the leader migration to finish before ack a > ControlledShutdownRequest > ------------------------------------------------------------------------------------------------ > > Key: KAFKA-6029 > URL: https://issues.apache.org/jira/browse/KAFKA-6029 > Project: Kafka > Issue Type: Improvement > Components: controller, core > Affects Versions: 1.0.0 > Reporter: Jiangjie Qin > Fix For: 1.1.0 > > > In the controlled shutdown process, the controller will return the > ControlledShutdownResponse immediately after the state machine is updated. > Because the LeaderAndIsrRequests and UpdateMetadataRequests may not have been > successfully processed by the brokers, the leader migration and active ISR > shrink may not have done when the shutting down broker proceeds to shut down. > This will cause some of the leaders to take up to replica.lag.time.max.ms to > kick the broker out of ISR. Meanwhile the produce purgatory size will grow. > Ideally, the controller should wait until all the LeaderAndIsrRequests and > UpdateMetadataRequests has been acked before sending back the > ControlledShutdownResponse. -- This message was sent by Atlassian JIRA (v6.4.14#64029)