[ https://issues.apache.org/jira/browse/KAFKA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844388#comment-16844388 ]
Jason Gustafson edited comment on KAFKA-6029 at 5/20/19 11:36 PM: ------------------------------------------------------------------ I think we can actually resolve this as an unintended benefit of [KIP-320|https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation]. When the controller shrinks the ISR, it bumps the epoch. The bumped epoch prevents the shutting down follower from being added back to the ISR. The controller may still send a LeaderAndIsr request to the shutting down broker with the updated epoch, but the shutting down broker will not restart the fetcher. was (Author: hachikuji): I think we can actually resolve this as a unintended benefit of [KIP-320|https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation]. When the controller shrinks the ISR, it bumps the epoch. The bumped epoch prevents the shutting down follower from being added back to the ISR. The controller may still send a LeaderAndIsr request to the shutting down broker with the updated epoch, but the shutting down broker will not restart the fetcher. > Controller should wait for the leader migration to finish before ack a > ControlledShutdownRequest > ------------------------------------------------------------------------------------------------ > > Key: KAFKA-6029 > URL: https://issues.apache.org/jira/browse/KAFKA-6029 > Project: Kafka > Issue Type: Sub-task > Components: controller, core > Affects Versions: 1.0.0 > Reporter: Jiangjie Qin > Assignee: Zhanxiang (Patrick) Huang > Priority: Major > > In the controlled shutdown process, the controller will return the > ControlledShutdownResponse immediately after the state machine is updated. > Because the LeaderAndIsrRequests and UpdateMetadataRequests may not have been > successfully processed by the brokers, the leader migration and active ISR > shrink may not have done when the shutting down broker proceeds to shut down. > This will cause some of the leaders to take up to replica.lag.time.max.ms to > kick the broker out of ISR. Meanwhile the produce purgatory size will grow. > Ideally, the controller should wait until all the LeaderAndIsrRequests and > UpdateMetadataRequests has been acked before sending back the > ControlledShutdownResponse. -- This message was sent by Atlassian JIRA (v7.6.3#76005)