We saw strange behavior with kafka 0.8.2 brokers today.  

Scenario:
We have 3 kafka brokers in dev and each topic has replication degree 3. We have 
a topic: X with 10 partitions. There are about 30 topics that we have on the 
cluster. We saw that just for topic X 1 partition was not replicated atleast 
for last few weeks. There is no flapping of brokers in ISR. Also, no data was 
added to topic X. 

Topic:X    PartitionCount:10       ReplicationFactor:3     
Configs:max.message.bytes=8000000
        Topic: X   Partition: 0    Leader: 1       Replicas: 1,0,2 Isr: 2,1,0
        Topic: X   Partition: 1    Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: X   Partition: 2    Leader: 2       Replicas: 0,2,1 Isr: 2,1,0
        Topic: X   Partition: 3    Leader: 1       Replicas: 1,2,0 Isr: 2,1,0
        Topic: X   Partition: 4    Leader: 2       Replicas: 2,0,1 Isr: 2
        Topic: X   Partition: 5    Leader: 2       Replicas: 0,1,2 Isr: 2,1,0
        Topic: X   Partition: 6    Leader: 1       Replicas: 1,0,2 Isr: 2,1,0
        Topic: X   Partition: 7    Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: X   Partition: 8    Leader: 2       Replicas: 0,2,1 Isr: 2,1,0
        Topic: X   Partition: 9    Leader: 1       Replicas: 1,2,0 Isr: 2,1,0

Note: Partition 4 has ISR: 2 for alteast 7 days. Not much data has been added 
into topic X as it is retry topic.

We bounced all the brokers and the underreplication issue was resolved but I 
think we have data loss (more details below).  

Questions
1) Why is data not replicated from leaders to followers ? I can understand if 
the data volume is high but data for this topic is not much. Few thousand 
messages per day.
2) When we restarted all the brokers we saw that leader became follower and 
rolled back the offset to older offset when it became follower. I didn't 
understand how can data loss happen. If broker 2 dies shouldn't ISR: list be 
empty and no leader should be selected for that partition ? 

[2016-04-05 13:59:24,059] INFO Partition [X,4] on broker 0: Expanding ISR for 
partition [X,4] from 0 to 0,1 (kafka.cluster.Partition)
[2016-04-05 13:59:24,244] INFO [ReplicaFetcherThread-8-2], Stopped  
(kafka.server.ReplicaFetcherThread)
[2016-04-05 14:00:14,279] ERROR [Replica Manager on Broker 0]: Error when 
processing fetch request for partition [X,4] offset 187185 from follower with 
correlation id 0. Possible cause: Request for offset 187185 but we only have 
log segments in the range 166211 to 166211. (kafka.server.ReplicaManager)
[2016-04-05 14:00:14,352] INFO Partition [X,4] on broker 0: Expanding ISR for 
partition [X,4] from 0,1 to 0,1,2 (kafka.cluster.Partition)

When broker 2 was restarted. It seems following things happen from the log
1) Broker 0 become the leader with offset 166211 
2) Broker 1  joined as follower with offset 166211 
3) Broker 2 joined as follower with offset 187185  but was then reset to 
166211. 

Thanks in advance for any insight.






                                          

Reply via email to