[ https://issues.apache.org/jira/browse/KAFKA-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuto Kawamura closed KAFKA-3471. -------------------------------- > min.insync.replicas isn't respected when there's a delaying follower who > still in ISR > ------------------------------------------------------------------------------------- > > Key: KAFKA-3471 > URL: https://issues.apache.org/jira/browse/KAFKA-3471 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.9.0.1 > Reporter: Yuto Kawamura > > h2. tl;dr; > Partition.checkEnoughReplicasReachOffset should see the number of followers > which are already caught up until requiredOffset instead of high watermark to > consider whether there are enough number of replicas for a produce request. > h2. Description > Just recently I found an interesting metric on our Kafka cluster. > During the peak time, the number of produce requests significantly decreased > only on single broker. Let's say this broker's id=1. > - broker-1 holds leadership for 3 partitions of the topic T. > - For each producer they configured to have acks=all. > - broker-1 contains some topics and each topic is configured to have 3 > replicas, and min.insync.replicas is configured to 2. > - For the partition 1 of topic T(T-1) there are 3 replicas namely: > broker-1(leader), broker-2(follower-A), broker-3(follower-B). > When I see the logs of broker-1, there was lot's of logs indicating ISR > expand and shrink happening frequently for T-1. > After investigating a while, we restarted broker-1 and unexpectedly > continuous ISR expand/shrink had gone. > Since it is highly likely a state corruption issue(because it's fixed by a > simple restart) and it's never reproduced after a broker restart, > unfortunately, but we lost clue to understand what was happening actually so > until today I'm not knowing the cause of this phenomenon. > By the way we continued investigating why frequent ISR shrink/expand causes > reduction of the number of produce requests and found that kafka broker isn't > likely respecting min.insync.replicas as the document of this config > describes. > Here's the scenario: > 1. Everything working well. > ISR(=LogEndOffset): leader=3, follower-A=2, follower-B=2 > HW: 2 > 2. Producer client produces some records. For simplicity it contains only one > record so the LogEndOffset is updated to 4, and the request will put into > purgatory since it has requiredOffset=4 while HW stay in 2. > ISR(=LogEndOffset): leader=4, follower-A=2, follower-B=2 > HW: 2 > 3. follower-A performs fetch and updated it's LogEndOffset to 4. IIUC, the > request received at 2. can be considered as succeeded ATM since it requires > only 2 out of 3 replicas are in sync(min.insync.replicas=2, acks=all), but > it's not with current implementation because of HW(ref: > https://github.com/apache/kafka/blob/1fbe445dde71df0023a978c5e54dd229d3d23e1b/core/src/main/scala/kafka/cluster/Partition.scala#L325). > ISR(=LogEndOffset): leader=4, follower-A=4, follower-B=2 > HW: 2 > 4. By some reason, follower-B couldn't perform fetch for a while. ATM > follower-B still in ISR because of replica.lag.time.max.ms, meaning it still > affects HW. > ISR(=LogEndOffset): leader=4, follower-A=4, follower-B=2 > HW: 2 > 5. By timeout the produce request received at 2. considered as failed and > client retries. Any incomming requests for T-1 will never succeed during this > moment. > ISR(=LogEndOffset): leader=4, follower-A=4, follower-B=2 > HW: 2 > 6. The leader decides to abandon follower-B from ISR because of > replica.lag.time.max.ms. HW increased to 4 and all produce requests can now > successfully processed. > ISR(=LogEndOffset): leader=4, follower-A=4 > HW: 4 > 7. After a while follower-B came back and caught up until the LogEndOffset so > the leader let him in to ISR again. The situation goes back to 1., continues > again. > So here I understand that records on a producer are accumulated to single > batch while the produce request for the T-1 blocked(and retried) during 4-6 > and that's why the total number of requests decreased significantly on > broker-1 while the total number of messages hasn't changed. > As I commented on 3., the leader should consider a produce request succeeded > after it confirms min.insync.replicas's number of acks, so the current > implementation which makes produce requests dependent to HW isn't makes sense > IMO. > When I confirmed this scenario our Kafka cluster used version 0.8.2.1 but I > confirmed that this scenario still could happen with the build from the > latest version of trunk. > Actually I still don't understand whether this is intentional behavior or not > but anyway it's differnt from what I understand when I read the doc of > min.insync.replicas, so I would like to hear what Kafka people think about > this. > I will prepared PR for this issue so please review my patch if my scenario > makes sense to you :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)