[jira] [Closed] (KAFKA-514) Replication with Leader Failure Test: Log segment files checksum mismatch

Jun Rao (JIRA) Tue, 04 Dec 2012 15:53:01 -0800

     [ 
https://issues.apache.org/jira/browse/KAFKA-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jun Rao closed KAFKA-514.
-------------------------

    
> Replication with Leader Failure Test: Log segment files checksum mismatch
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-514
>                 URL: https://issues.apache.org/jira/browse/KAFKA-514
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: John Fung
>            Priority: Blocker
>              Labels: replication-testing
>             Fix For: 0.8
>
>         Attachments: kafka-514-reproduce-issue.patch, kafka-514_v1.patch, 
> kafka-514_v2.patch, system_test_output_archive.tar.gz, testcase_2.tar
>
>
> Test Description:
>    1. Produce and consume messages to 1 topics and 3 partitions.
>    2. This test sends 10 messages every 2 sec to 3 replicas.
>    3. At the end verifies the log size and contents as well as using a 
> consumer to verify that there is no message loss.
> The issue:
> When the leader is terminated by a controlled failure (kill -15), the 
> resulting log segment files size are not all matching. The mismatch log 
> segment size would happen in one of the partition of the terminated broker. 
> This is consistently reproducible from the system regression test for 
> replication with the following configurations:
>     * zookeeper: 1-node (local)
>     * brokers: 3-node cluster (all local)
>     * replica factor: 3
>     * no. of topic: 1
>     * no. of partition: 2
>     * iterations of leader failure: 1
> Remarks:
>     * It is rarely reproducible if the no. of partitions is 1.
>     * Even the file checksums are not matching, the no. of messages in the 
> producer & consumer logs are equal
> Test result (shown with log file checksum):
> broker-1 :
> test_1-0/00000000000000000000.kafka => 1690639555
> test_1-1/00000000000000000000.kafka => 4068655384    <<<< not matching across 
> all replicas
> broker-2 :
> test_1-0/00000000000000000000.kafka => 1690639555
> test_1-1/00000000000000000000.kafka => 4068655384    <<<< not matching across 
> all replicas
> broker-3 :
> test_1-0/00000000000000000000.kafka => 1690639555
> test_1-1/00000000000000000000.kafka => 3530842923    <<<< not matching across 
> all replicas
> Errors:
> The following error is found in the terminated leader:
> [2012-09-14 11:07:05,217] WARN No previously checkpointed highwatermark value 
> found for topic test_1 partition 1. Returning 0 as the highwatermark 
> (kafka.server.HighwaterMarkCheckpoint)
> [2012-09-14 11:07:05,220] ERROR Replica Manager on Broker 3: Error processing 
> leaderAndISR request LeaderAndIsrRequest(1,,true,1000,Map((test_1,1) -> { 
> "ISR": "1,2","leader": "1","leaderEpoch": "0" }, (test_1,0) -> { "ISR": "
> 1,2","leader": "1","leaderEpoch": "1" })) (kafka.server.ReplicaManager)
> kafka.common.KafkaException: End index must be segment list size - 1
>         at kafka.log.SegmentList.truncLast(SegmentList.scala:82)
>         at kafka.log.Log.truncateTo(Log.scala:471)
>         at kafka.cluster.Partition.makeFollower(Partition.scala:171)
>         at kafka.cluster.Partition.makeLeaderOrFollower(Partition.scala:126)
>         at 
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$makeFollower(ReplicaManager.scala:195)
>         at 
> kafka.server.ReplicaManager$$anonfun$becomeLeaderOrFollower$2.apply(ReplicaManager.scala:154)
>         at 
> kafka.server.ReplicaManager$$anonfun$becomeLeaderOrFollower$2.apply(ReplicaManager.scala:144)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
>         at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:631)
>         at 
> scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
>         at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
>         at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:144)
>         at 
> kafka.server.KafkaApis.handleLeaderAndISRRequest(KafkaApis.scala:73)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:60)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:40)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (KAFKA-514) Replication with Leader Failure Test: Log segment files checksum mismatch

Reply via email to