[ https://issues.apache.org/jira/browse/KAFKA-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Swapnil Ghike updated KAFKA-999: -------------------------------- Attachment: kafka-999-v3.patch > Controlled shutdown never succeeds until the broker is killed > ------------------------------------------------------------- > > Key: KAFKA-999 > URL: https://issues.apache.org/jira/browse/KAFKA-999 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Swapnil Ghike > Priority: Critical > Attachments: kafka-999-v1.patch, kafka-999-v2.patch, > kafka-999-v3.patch > > > A race condition in the way leader and isr request is handled by the broker > and controlled shutdown can lead to a situation where controlled shutdown can > never succeed and the only way to bounce the broker is to kill it. > The root cause is that broker uses a smart to avoid fetching from a leader > that is not alive according to the controller. This leads to the broker > aborting a become follower request. And in cases where replication factor is > 2, the leader can never be transferred to a follower since it keeps rejecting > the become follower request and stays out of the ISR. This causes controlled > shutdown to fail forever > One sequence of events that led to this bug is as follows - > - Broker 2 is leader and controller > - Broker 2 is bounced (uncontrolled shutdown) > - Controller fails over > - Controlled shutdown is invoked on broker 1 > - Controller starts leader election for partitions that broker 2 led > - Controller sends become follower request with leader as broker 1 to broker > 2. At the same time, it does not include broker 1 in alive broker list sent > as part of leader and isr request > - Broker 2 rejects leaderAndIsr request since leader is not in the list of > alive brokers > - Broker 1 fails to transfer leadership to broker 2 since broker 2 is not in > ISR > - Controlled shutdown can never succeed on broker 1 > Since controlled shutdown is a config option, if there are bugs in controlled > shutdown, there is no option but to kill the broker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira