[ https://issues.apache.org/jira/browse/KAFKA-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022117#comment-15022117 ]
jin xing commented on KAFKA-2334: --------------------------------- after receiving LeaderAndIsrRequest, Broker B2 will finally call " Partition::makeLeader", part of code is as below: ... zkVersion = leaderAndIsr.zkVersion leaderReplicaIdOpt = Some(localBrokerId) // construct the high watermark metadata for the new leader replica val newLeaderReplica = getReplica().get newLeaderReplica.convertHWToLocalOffsetMetadata() // reset log end offset for remote replicas assignedReplicas.foreach(r => if (r.brokerId != localBrokerId) r.logEndOffset = LogOffsetMetadata.UnknownOffsetMetadata) // we may need to increment high watermark since ISR could be down to 1 maybeIncrementLeaderHW(newLeaderReplica) if (topic == OffsetManager.OffsetsTopicName) offsetManager.loadOffsetsFromLog(partitionId) ... I can tell Broker B2 will first set 'leaderReplicaIdOpt = Some(localBrokerId)', and then try to update high watermark; by setting leaderReplicaIdOpt, Broker B2 will be available for consumer(if the consumer send fetchReqeust, there will be no NotLeaderForPartitionException); In the short interval which after 'leaderReplicaIdOpt = Some(localBrokerId)' and before setting up hw, what the consumer get is the "gone back" hw; If my understanding is wright, just reverse the order of setting up leaderReplicaIdOpt and updating high watermark will fix this issue; am I wrong ? > Prevent HW from going back during leader failover > -------------------------------------------------- > > Key: KAFKA-2334 > URL: https://issues.apache.org/jira/browse/KAFKA-2334 > Project: Kafka > Issue Type: Bug > Components: replication > Affects Versions: 0.8.2.1 > Reporter: Guozhang Wang > Assignee: Neha Narkhede > Fix For: 0.10.0.0 > > > Consider the following scenario: > 0. Kafka use replication factor of 2, with broker B1 as the leader, and B2 as > the follower. > 1. A producer keep sending to Kafka with ack=-1. > 2. A consumer repeat issuing ListOffset request to Kafka. > And the following sequence: > 0. B1 current log-end-offset (LEO) 0, HW-offset 0; and same with B2. > 1. B1 receive a ProduceRequest of 100 messages, append to local log (LEO > becomes 100) and hold the request in purgatory. > 2. B1 receive a FetchRequest starting at offset 0 from follower B2, and > returns the 100 messages. > 3. B2 append its received message to local log (LEO becomes 100). > 4. B1 receive another FetchRequest starting at offset 100 from B2, knowing > that B2's LEO has caught up to 100, and hence update its own HW, and > satisfying the ProduceRequest in purgatory, and sending the FetchResponse > with HW 100 back to B2 ASYNCHRONOUSLY. > 5. B1 successfully sends the ProduceResponse to the producer, and then fails, > hence the FetchResponse did not reach B2, whose HW remains 0. > From the consumer's point of view, it could first see the latest offset of > 100 (from B1), and then see the latest offset of 0 (from B2), and then the > latest offset gradually catch up to 100. > This is because we use HW to guard the ListOffset and > Fetch-from-ordinary-consumer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)