[ https://issues.apache.org/jira/browse/KAFKA-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-10706. ------------------------------------- Fix Version/s: 2.7.1 2.6.1 2.5.2 2.4.2 Resolution: Fixed > Liveness bug in truncation protocol can lead to indefinite URP > -------------------------------------------------------------- > > Key: KAFKA-10706 > URL: https://issues.apache.org/jira/browse/KAFKA-10706 > Project: Kafka > Issue Type: Bug > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Major > Fix For: 2.4.2, 2.5.2, 2.6.1, 2.7.1 > > > We hit an interesting liveness condition in the truncation protocol. Broker A > was leader in epoch 7, broker B was leader in epoch 8, and then broker A was > leader in epoch 9 again. > On broker A, we had the following state in the epoch cache: > {code} > epoch 4, start offset 3953 > epoch 7, start offset 3983 > epoch 9, start offset 3988 > {code} > On broker B, we had the following: > {code} > epoch 4, start offset 3953 > epoch 8, start offset 3983 > {code} > After A was elected, broker B sent epoch 8 in OffsetsForLeaderEpoch. Broker A > correctly responded with epoch 7 ending at offset 3988. The end offset on > broker B was in fact 3983, so this truncation had no effect. Broker B then > retried with epoch 8 again and replication was stuck. > When a replica becomes leader, it first inserts an entry into the epoch cache > with the current log end offset. This ensures that that it has a larger epoch > in the cache than any epoch that could be requested by a valid replica. > However, I think it is incorrect to turn around and use this epoch when > becoming a follower. It seems like we need symmetric logic after becoming a > follower to remove this epoch entry. -- This message was sent by Atlassian Jira (v8.3.4#803005)