[ https://issues.apache.org/jira/browse/KAFKA-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao updated KAFKA-3083: --------------------------- Description: The following sequence can happen. 1. Broker A is the controller and is in the middle of processing a broker change event. As part of this process, let's say it's about to shrink the isr of a partition. 2. Then broker A's session expires and broker B takes over as the new controller. Broker B sends the initial leaderAndIsr request to all brokers. 3. Broker A continues by shrinking the isr of the partition in ZK and sends the new leaderAndIsr request to the broker (say C) that leads the partition. Broker C will reject this leaderAndIsr since the request comes from a controller with an older epoch. Now we could be in a situation that Broker C thinks the isr has all replicas, but the isr stored in ZK is different. was: The following sequence can happen. 1. Broker A is the controller and is in the middle of processing a broker change event. As part of this process, let's say it's about to shrink the isr of a partition. 2. Then broker A's session expires and broker B takes over as the new controller. Broker B sends the initial leaderAndIsr request to all brokers. 3. Broker A continues by shrinking the isr of the partition in ZK and sends the new leaderAndIsr request to the broker (say C) that leads the partition. Broker C will reject this leaderAndIsr since the request comes from a controller with an older epoch. Now we could be in a situation that Broker C thinks the isr has all replicas, but the isr stored in ZK is different. 1. Originally, broker 12 was the controller with controller epoch 4. It received the following broker change event and was in the middle of processing this event by selecting new leaders and shrinking ISRs. 2015-12-25 09:10:57,339 INFO kafka.utils.Logging$class:68 [ZkClient-EventThread-93-ec2-107-20-175-177.compute-1.amazonaws.com:2181,ec2-107-20-175-179.compute-1.amazonaws.com:2181,ec2-107-20-175-226.compute-1.amazonaws.com:2181,ec2-107-20-175-229.compute-1.amazonaws.com:2181,ec2-107-20-175-232.compute-1.amazonaws.com:2181/kskafka/everest] [info] [BrokerChangeListener on Controller 12]: Newly added brokers: , deleted brokers: 0,10,56,42,25,20,29,1,33,9,53,41,64,59,27,49,7,39,35,11,55,8,30,19,4,47,68, all live brokers: 5,24,37,52,14,46,57,61,6,60,28,38,70,21,65,13,2,32,34,45,17,22,44,71,54,66,3,48,63,18,50,67,16,31,43,40,26,23,58,36,51,15,62 2. Then broker 12's ZK session expired and broker 30 took over as the controller with controller epoch 6. 2015-12-25 09:11:11,012 INFO kafka.utils.Logging$class:68 [ZkClient-EventThread-93-ec2-107-20-175-177.compute-1.amazonaws.com:2181,ec2-107-20-175-179.compute-1.amazonaws.com:2181,ec2-107-20-175-226.compute-1.amazonaws.com:2181,ec2-107-20-175-229.compute-1.amazonaws.com:2181,ec2-107-20-175-232.compute-1.amazonaws.com:2181/kskafka/everest] [info] [Controller 30]: Controller 30 incremented epoch to 6 3. Controller 30 read the current leaderAndIsr for [streaming_client_log,3] (with leader epoch 5) from ZK during initialization and sent it to broker 31 (the leader of streaming_client_log,3) with controller epoch 6 4. Old controller 12 continued from step 1. It shrank the ISR for [streaming_client_log,3] and changed leader epoch to 6. 2015-12-25 09:11:13,274 INFO kafka.utils.Logging$class:68 [ZkClient-EventThread-93-ec2-107-20-175-177.compute-1.amazonaws.com:2181,ec2-107-20-175-179.compute-1.amazonaws.com:2181,ec2-107-20-175-226.compute-1.amazonaws.com:2181,ec2-107-20-175-229.compute-1.amazonaws.com:2181,ec2-107-20-175-232.compute-1.amazonaws.com:2181/kskafka/everest] [info] [Controller 12]: New leader and ISR for partition [streaming_client_log,3] is {"leader":31,"leader_epoch":6,"isr":[31]} 5. Old controller 12 sent leaderAndIsr to broker 31, but it's ignored since the highest controller epoch on broker 31 is 6, which is higher than the controller epoch 4 in leaderAndIsr. 2015-12-25 09:11:15,484 WARN kafka.utils.Logging$class:83 [kafka-request-handler-6] [warn] Broker 31 ignoring LeaderAndIsr request from controller 12 with correlation id 769 since its controller epoch 4 is old. Latest known controller epoch is 6 6. Old controller 12 finally received the ZK session expiration event and stopped acting as the controller. > a soft failure in controller may leader a topic partition in an inconsistent > state > ---------------------------------------------------------------------------------- > > Key: KAFKA-3083 > URL: https://issues.apache.org/jira/browse/KAFKA-3083 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.9.0.0 > Reporter: Jun Rao > > The following sequence can happen. > 1. Broker A is the controller and is in the middle of processing a broker > change event. As part of this process, let's say it's about to shrink the isr > of a partition. > 2. Then broker A's session expires and broker B takes over as the new > controller. Broker B sends the initial leaderAndIsr request to all brokers. > 3. Broker A continues by shrinking the isr of the partition in ZK and sends > the new leaderAndIsr request to the broker (say C) that leads the partition. > Broker C will reject this leaderAndIsr since the request comes from a > controller with an older epoch. Now we could be in a situation that Broker C > thinks the isr has all replicas, but the isr stored in ZK is different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)