[ https://issues.apache.org/jira/browse/KAFKA-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mickael Maison resolved KAFKA-7122. ----------------------------------- Resolution: Won't Fix We are now removing ZooKeeper support so closing this issue. > Data is lost when ZooKeeper times out > ------------------------------------- > > Key: KAFKA-7122 > URL: https://issues.apache.org/jira/browse/KAFKA-7122 > Project: Kafka > Issue Type: Bug > Components: core, replication > Affects Versions: 0.11.0.2 > Reporter: Nick Lipple > Priority: Blocker > > Noticed that a kafka cluster will lose data when a leader for a partition has > their zookeeper connection timeout. > Sequence of events: > # Say broker A leads a partition followed by brokers B and C > # A ZK node has a network issue, happens to be the node used by broker A. > Lets say this happens at offset X > # Kafka Controller immediately selects broker C as the new partition leader > # Broker A does not timeout from zookeeper for another 4 seconds. Broker A > still thinks it is the leader, presumably accepting producer writes. > # Broker A detects the ZK timeout and leaves the ISR. > # Broker A reconnects to ZK, rejoins cluster as follower for partition > # Broker A truncates log to some offset Y such that Y > X. Broker A proceeds > to catch up normally and becomes an ISR > # ISRs for partition are now in an inconsistent state: > ## Broker C has all offsets X through Y plus everything after > ## Broker B has all offsets X through Y plus everything after > ## Broker A has offsets up to X and after Y. Everything between X and Y *IS > MISSING* > # Within 5 minutes, controller trigger preferred replica election making > Broker A the new leader for partition (this is default behavior) > All consumers after step 9 will not receive any messages for offsets between > X and Y. > > The root problem here seems to be broker A truncates to offset Y when > rejoining the cluster. It should be truncating further back to offset X to > prevent data loss > -- This message was sent by Atlassian Jira (v8.20.10#820010)