[ https://issues.apache.org/jira/browse/KAFKA-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-7164. ------------------------------------ Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 1.1.2 > Follower should truncate after every leader epoch change > -------------------------------------------------------- > > Key: KAFKA-7164 > URL: https://issues.apache.org/jira/browse/KAFKA-7164 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: Bob Barrett > Priority: Major > Fix For: 1.1.2, 2.0.1, 2.1.0 > > > Currently we skip log truncation for followers if a LeaderAndIsr request is > received, but the leader does not change. This can lead to log divergence if > the follower missed a leader change before the current known leader was > reelected. Basically the problem is that the leader may truncate its own log > prior to becoming leader again, so the follower would need to reconcile its > log again. > For example, suppose that we have three replicas: r1, r2, and r3. Initially, > r1 is the leader in epoch 0 and writes one record at offset 0. r3 replicates > this successfully. > {code} > r1: > status: leader > epoch: 0 > log: [{id: 0, offset: 0, epoch:0}] > r2: > status: follower > epoch: 0 > log: [] > r3: > status: follower > epoch: 0 > log: [{id: 0, offset: 0, epoch:0}] > {code} > Suppose then that r2 becomes leader in epoch 1. r1 notices the leader change > and truncates, but r3 for whatever reason, does not. > {code} > r1: > status: follower > epoch: 1 > log: [] > r2: > status: leader > epoch: 1 > log: [] > r3: > status: follower > epoch: 0 > log: [{offset: 0, epoch:0}] > {code} > Now suppose that r2 fails and r1 becomes the leader in epoch 2. Immediately > it writes a new record: > {code} > r1: > status: leader > epoch: 2 > log: [{id: 1, offset: 0, epoch:2}] > r2: > status: follower > epoch: 2 > log: [] > r3: > status: follower > epoch: 0 > log: [{id: 0, offset: 0, epoch:0}] > {code} > If the replica continues fetching with the old epoch, we can have log > divergence as noted in KAFKA-6880. However, if r3 successfully receives the > new LeaderAndIsr request which updates the epoch to 2, but skips the > truncation, then the logs will stay inconsistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)