[ 
https://issues.apache.org/jira/browse/KAFKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858064#comment-13858064
 ] 

Hanish Bansal commented on KAFKA-1193:
--------------------------------------

If i don't perform 5th step (i.e. If there is only one broker node in isr list 
then wait for some time and again check isr status of topic. There should be 2 
brokers in isr list.) listed properly then i am able to see logs like:
{quote}
[2013-12-23 10:25:07,648] DEBUG [OfflinePartitionLeaderSelector]: No broker in 
ISR is alive for [test-trunk111,1]. Pick the leader from the alive assigned 
replicas: 1 (kafka.controller.OfflinePartitionLeaderSelector)
[2013-12-23 10:25:07,648] WARN [OfflinePartitionLeaderSelector]: No broker in 
ISR is alive for [test-trunk111,1]. Elect leader 1 from live brokers 1. There's 
potential data loss. (kafka.controller.OfflinePartitionLeaderSelector)
[2013-12-23 10:25:07,649] INFO [OfflinePartitionLeaderSelector]: Selected new 
leader and ISR {"leader":1,"leader_epoch":1,"isr":[1]} for offline partition 
[test-trunk111,1] (kafka.controller.OfflinePartitionLeaderSelector)
{quote}

In this case where only one broker is in isr list i experienced 50-60 % data 
loss where is the case where both 2 brokers are in isr list i experienced only 
2-3 % data loss.


> Data loss if broker is killed using kill -9
> -------------------------------------------
>
>                 Key: KAFKA-1193
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1193
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.8.0, 0.8.1
>         Environment: Centos 6.3
>            Reporter: Hanish Bansal
>            Assignee: Neha Narkhede
>
> We are having kafka cluster of 2 nodes. (Using Kafka 0.8.0 version)
> Replication Factor: 2
> Number of partitions: 2
> Actual Behaviour:
> -------------------------
> Out of two nodes, if leader node goes down then data lost happens.
> Steps to Reproduce:
> ------------------------------
> 1. Create a 2 node kafka cluster with replication factor 2
> 2. Start the Kafka cluster
> 3. Create a topic lets say "test-trunk111"
> 4. Restart any one node.
> 5. Check topic status using kafka-list-topic tool.
> topic isr status is:
> topic: test-trunk111    partition: 0    leader: 0    replicas: 1,0    isr: 0,1
> topic: test-trunk111    partition: 1    leader: 0    replicas: 0,1    isr: 0,1
> If there is only one broker node in isr list then wait for some time and 
> again check isr status of topic. There should be 2 brokers in isr list.
> 6. Start producing the data.
> 7. Kill leader node (borker-0 in our case) meanwhile of data producing.
> 8. After all data is produced start consumer.
> 9. Observe the behaviour. There is data loss.
> After leader goes down, topic isr status is:
> topic: test-trunk111    partition: 0    leader: 1    replicas: 1,0    isr: 1
> topic: test-trunk111    partition: 1    leader: 1    replicas: 0,1    isr: 1
> We have tried below things to avoid data loss:
> ----------------------------------------------------------------
> 1. Configured "request.required.acks=-1" in producer configuration because as 
> mentioned in documentation 
> http://kafka.apache.org/documentation.html#producerconfigs, setting this 
> value to -1 provides guarantee that no messages will be lost.
> 2. Increased the "message.send.max.retries" from 3 to 10 in producer 
> configuration.
> 3. Set "controlled.shutdown.enable" to true in broker configuration.
> 4. Tested with Kafka-0.8.1 after applying patch KAFKA-1188.patch available on 
> https://issues.apache.org/jira/browse/KAFKA-1188 
> Nothing work out from above things in case of leader node is killed using 
> "kill -9 <pid>".
> Expected Behaviour:
> ----------------------------
> No data should be lost.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to