Hanish Bansal created KAFKA-1193: ------------------------------------ Summary: Data loss if broker is killed using kill -9 Key: KAFKA-1193 URL: https://issues.apache.org/jira/browse/KAFKA-1193 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.0, 0.8.1 Environment: Centos 6.3 Reporter: Hanish Bansal Assignee: Neha Narkhede
We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version) Replication Factor: 2 Number of partitions: 2 Actual Behaviour: ------------------------- Out of two nodes, if leader node goes down then data lost happens. Steps to Reproduce: ------------------------------ 1. Create a 2 node kafka cluster with replication factor 2 2. Start the Kafka cluster 3. Create a topic lets say "test-trunk111" 4. Restart any one node. 5. Check topic status using kafka-list-topic tool. topic isr status is: topic: test-trunk111 partition: 0 leader: 0 replicas: 1,0 isr: 0,1 topic: test-trunk111 partition: 1 leader: 0 replicas: 0,1 isr: 0,1 If there is only one broker node in isr list then wait for some time and again check isr status of topic. There should be 2 brokers in isr list. 6. Start producing the data. 7. Kill leader node (borker-0 in our case) meanwhile of data producing. 8. After all data is produced start consumer. 9. Observe the behaviour. There is data loss. After leader goes down, topic isr status is: topic: test-trunk111 partition: 0 leader: 1 replicas: 1,0 isr: 1 topic: test-trunk111 partition: 1 leader: 1 replicas: 0,1 isr: 1 We have tried below things to avoid data loss: ---------------------------------------------------------------- 1. Configured "request.required.acks=-1" in producer configuration because as mentioned in documentation http://kafka.apache.org/documentation.html#producerconfigs, setting this value to -1 provides guarantee that no messages will be lost. 2. Increased the "message.send.max.retries" from 3 to 10 in producer configuration. 3. Set "controlled.shutdown.enable" to true in broker configuration. 4. Tested with Kafka-0.8.1 after applying patch KAFKA-1188.patch available on https://issues.apache.org/jira/browse/KAFKA-1188 Nothing work out from above things in case of leader node is killed using "kill -9 <pid>". Expected Behaviour: ---------------------------- No data should be lost. -- This message was sent by Atlassian JIRA (v6.1.5#6160)