[ https://issues.apache.org/jira/browse/KAFKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402740#comment-15402740 ]
Jun Rao commented on KAFKA-4009: -------------------------------- [~aganesan], thanks for reporting this. In your test, did the producer use ack=all? Also, did N1 detect the corruption during log recovery when restarting the broker or during appending to the log? > Data corruption or EIO leads to data loss > ----------------------------------------- > > Key: KAFKA-4009 > URL: https://issues.apache.org/jira/browse/KAFKA-4009 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 0.9.0.0 > Reporter: Aishwarya Ganesan > > I have a 3 node kafka cluster (N1,N2 and N3) with > log.flush.interval.messages=1, min.insync.replicas=3 and > unclean.leader.election.enable=false and a single Zookeeper node. My workload > inserts few messages and on completion of the workload, the > recovery-point-offset-checkpoint reflects the latest offset of the messages > committed. > I have a small testing tool that drives distributed applications into corner > cases by simulating possible error conditions like EIO, ENOSPC and EDQUOT > that can be encountered in all modern file systems such as ext4. The tool > also simulates on-disk silent data corruption. > When I introduce silent data corruption in a node (say N1) in the ISR, Kafka > is able to detect corruption using checksum and ignores the log entry from > that point onwards. Even though N1 has lost log entries and > recovery-point-offset-checkpoint file in N1 indicates the latest offsets, N1 > is allowed to become the leader because it is in the ISR. Also, the other > nodes N2 and N3 crash with the following log message: > FATAL [ReplicaFetcherThread-0-1], Halting because log truncation is not > allowed for topic my-topic1, Current leader 1's latest offset 0 is less than > replica 3's latest offset 1 (kafka.server.ReplicaFetcherThread) > The end result is that a silent data corruption leads to data loss because > querying the cluster returns only messages before the corrupted entry. Note > that the cluster at this point has only N1. This situation could have been > avoided if the node N1 which had to ignore the log entry wasn't allowed to > become the leader. This scenario wouldn't happen in a majority based leader > election as other nodes (N2 or N3) would have denied vote for N1 because N1's > log is not complete compared to N2 or N3's log. > If this scenario happens in any of the followers, it ignores the log entry > and copies data from the leader and there is no data loss. > Encountering an EIO thrown by the file system for a particular block results > in the same consequence of data loss on querying the cluster and the > remaining two nodes crash. An EIO on read could be thrown for a variety of > reasons including a latent sector error of one or more sectors on disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)