[ https://issues.apache.org/jira/browse/KAFKA-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Swapnil Ghike updated KAFKA-763: -------------------------------- Attachment: kafka-763-new-v1.patch Copy pasting the comments from patch new-v1: 1. The leader's log could be partially overlapping with the follower's log. The only way to get an OffsetOutOfRangeException in such a situation is when the follower's end offset is ahead of the leader's end offset. This is possible if there is unclean leader election: A follower goes down, in the meanwhile the leader keeps appending messages. The follower comes back up and before it has completely caught up with the leader's logs, the ISR goes down. The follower is now uncleanly elected as the new leader, and it appends messages. The old leader comes back up, becomes a follower and it may find that the current leader's end offset falls between its own start offset and its own end offset. In such a case, truncate the follower's log to the current leader's end offset and continue fetching. There is a potential for a mismatch between the logs of the two replicas here. We don't fix this mismatch as of now. 2. Otherwise, the leader's log could be completely non-overlapping with the follower's log: i. The follower could have been down for a long time and when it starts up, its end offset could be smaller than or equal to the leader's start offset because the leader has deleted old logs (log.logEndOffset <= leaderStartOffset). OR ii. Unclean leader election: A follower could be down for a long time. When it starts up, the ISR goes down before the follower has the opportunity to even start catching up with the leader's logs. The follower is now uncleanly elected as the new leader. The old leader comes back up, becomes a follower and it may find that the current leader's end offset is smaller than or equal to its own start offset (log.logStartOffset >= leaderEndOffset). In both these cases, roll out a new log at the follower with the start offset equal to the current leader's start offset and continue fetching. Other changes: 1. Fixed the error message for autoOffsetReset in ConsumerConfig. 2. Added a method logStartOffset in Log. > Add an option to replica from the largest offset during unclean leader > election > ------------------------------------------------------------------------------- > > Key: KAFKA-763 > URL: https://issues.apache.org/jira/browse/KAFKA-763 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 0.8 > Reporter: Jun Rao > Assignee: Swapnil Ghike > Priority: Blocker > Labels: kafka-0.8, p2 > Attachments: kafka-763-new-v1.patch, kafka-763_v1.patch > > > If there is an unclean leader election, a follower may have an offset out of > the range of the leader. Currently, the follower will delete all its data and > refetch from the smallest offset of the leader. It would be useful to add an > option to let the follower refetch from the largest offset of the leader > since refetching from the smallest offset may take some time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira