[ https://issues.apache.org/jira/browse/KAFKA-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182552#comment-15182552 ]
ASF GitHub Bot commented on KAFKA-2960: --------------------------------------- GitHub user becketqin opened a pull request: https://github.com/apache/kafka/pull/1018 KAFKA-2960: Clear purgatory for partitions before becoming follower You can merge this pull request into a Git repository by running: $ git pull https://github.com/becketqin/kafka KAFKA-2960 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1018.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1018 ---- commit 6ee590bc8f65217227c8bda98644dce35ed0d701 Author: Jiangjie Qin <becket....@gmail.com> Date: 2016-03-07T04:04:45Z KAFKA-2960: Clear purgatory for partition before becoming follower ---- > DelayedProduce may cause message lose during repeatly leader change > ------------------------------------------------------------------- > > Key: KAFKA-2960 > URL: https://issues.apache.org/jira/browse/KAFKA-2960 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.9.0.0 > Reporter: Xing Huang > Assignee: Jiangjie Qin > Fix For: 0.10.0.0 > > > related to #KAFKA-1148 > When a leader replica became follower then leader again, it may truncated its > log as follower. But the second time it became leader, its ISR may shrink and > if at this moment new messages were appended, the DelayedProduce generated > when it was leader the first time may be satisfied, and the client will > receive a response with no error. But, actually the messages were lost. > We simulated this scene, which proved the message lose could happen. And it > seems to be the reason for a data lose recently happened to us according to > broker logs and client logs. > I think we should check the leader epoch when send a response, or satisfy > DelayedProduce when leader change as described in #KAFKA-1148. > And we may need an new error code to inform the producer about this error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)