[ https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068209#comment-14068209 ]
Jun Rao commented on KAFKA-1546: -------------------------------- Jay, I think what you proposed is nice and simple. I think it works. One subtlety is dealing with max.wait.ms in the follower fetch request. Imagine that a follower has caught up and its fetch request is sitting in the purgatory. The last caught up time won't be updated for max.wait.ms if no new messages come in. When a message does come in, if max.wait.ms is larger than replica.lag.time.ms, the follower is now considered out of sync. Perhaps we should use the timestamp of when the first message is appended to the leader after the last caught up time. If the amount of time since then is more than replica.lag.time.ms, the replica is considered out of sync. > Automate replica lag tuning > --------------------------- > > Key: KAFKA-1546 > URL: https://issues.apache.org/jira/browse/KAFKA-1546 > Project: Kafka > Issue Type: Improvement > Components: replication > Affects Versions: 0.8.0, 0.8.1, 0.8.1.1 > Reporter: Neha Narkhede > Labels: newbie++ > > Currently, there is no good way to tune the replica lag configs to > automatically account for high and low volume topics on the same cluster. > For the low-volume topic it will take a very long time to detect a lagging > replica, and for the high-volume topic it will have false-positives. > One approach to making this easier would be to have the configuration > be something like replica.lag.max.ms and translate this into a number > of messages dynamically based on the throughput of the partition. -- This message was sent by Atlassian JIRA (v6.2#6252)