[ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659503#comment-14659503 ]
James Cheng commented on KAFKA-2143: ------------------------------------ Thanks [~becket_qin]. [~ewencp] sent me that slidedeck of yours. I added all your suggested settings to my team's "Kafka recommended settings" wiki. :) Even with ack=-1, I think it's still possible for a replica to lag behind the leader by 2000 messages, right? It would be lagging behind by 2000 messages that were produced but not-yet-acknowledged. Not sure if losing those would count as message-loss, since they weren't actually ack'd? replica.lag.max.messages is a broker-level config setting, not a topic level config setting. I think that means that it's hard to find a setting that works for high-volume topics and low-volume topics, right? And... a bit of Googling around, and I think my observation is the exact thing that inspired https://cwiki.apache.org/confluence/display/KAFKA/KIP-16+-+Automated+Replica+Lag+Tuning and http://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/, right? Is it possible for broker to be in the ISR for one topic, and to be out of the ISR for another topic? > Replicas get ahead of leader and fail > ------------------------------------- > > Key: KAFKA-2143 > URL: https://issues.apache.org/jira/browse/KAFKA-2143 > Project: Kafka > Issue Type: Bug > Components: replication > Affects Versions: 0.8.2.1 > Reporter: Evan Huus > Assignee: Jiangjie Qin > Fix For: 0.8.3 > > > On a cluster of 6 nodes, we recently saw a case where a single > under-replicated partition suddenly appeared, replication lag spiked, and > network IO spiked. The cluster appeared to recover eventually on its own, > Looking at the logs, the thing which failed was partition 7 of the topic > {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was > 3. Here are the interesting log lines: > On node 3 (the leader): > {noformat} > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 148185816. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when > processing fetch request for partition [background_queue,7] offset 3722949957 > from follower with correlation id 156007054. Possible cause: Request for > offset 3722949957 but we only have log segments in the range 3648049863 to > 3722949955. (kafka.server.ReplicaManager) > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: > Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 > (kafka.cluster.Partition) > {noformat} > Note that both replicas suddenly asked for an offset *ahead* of the available > offsets. > And on nodes 1 and 4 (the replicas) many occurrences of the following: > {noformat} > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log > background_queue-7 for deletion. (kafka.log.Log) (edited) > {noformat} > Based on my reading, this looks like the replicas somehow got *ahead* of the > leader, asked for an invalid offset, got confused, and re-replicated the > entire topic from scratch to recover (this matches our network graphs, which > show 3 sending a bunch of data to 1 and 4). > Taking a stab in the dark at the cause, there appears to be a race condition > where replicas can receive a new offset before the leader has committed it > and is ready to replicate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)