[ https://issues.apache.org/jira/browse/KAFKA-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512485#comment-15512485 ]
Ben Stopford commented on KAFKA-4178: ------------------------------------- Thanks Joel. Actually this comment, from that thread, makes some sense "Basically, we were having issues with very large metric values when the metric was very recently created." My guess is, this difference in requirements comes from the fact that client quotas throttles by imposing a delay, so if you overestimate the metric, as is possible when using the Elapsed Window method, you could calculate a very long delay which might cause a client to time out. Replication throttling doesn't have this issue in the same way, as a overestimate will only affect replication for as long as the metric is actually overestimated. Which is never more than one or two sub windows in practice. But replication throttling does have an issues with the Fixed Window approach, as it consistently underestimates for the entire first window (i.e. ten sub-windows). So if we really want to merge the approaches, I actually implemented another type of rate (removed from this PR for simplicity) but I'll bring it up here. You can see it in this commit, it's called FixedSubWindowPolicy https://github.com/benstopford/kafka/blob/edb51d1d0df04b06a980940f9688a0ab06112784/clients/src/main/java/org/apache/kafka/common/metrics/stats/Window.java This is essentially a simple hybrid of both approaches. If we really want to consolidate on one approach, this hybrid approach would be best I believe. I'll replicate it here as it's very simple: {code:title=Window.java|borderStyle=solid} /** * This policy fixes the first sub-window. If measurements do not span * more than one sub-window then the whole sub-window duration is used * to calculate the rate. * * However if there are measurements spanning multiple sub windows this rate * behaves identically to the elapsed window policy. * * So this provides a slow start, in a similar fashion to FixedWindows, * but only over the duration of the first sub-window rather than all * sub-windows. * * This policy policy provides a balance between the other two. It has a short * "slow start", in comparison to teh Fixed policy, after which it will have * the accuracy of the Elapsed policy. */ private static class FixedSubWindowPolicy implements Policy { @Override public long windowSize(long first, long last, MetricConfig config) { long elapsed = last - first; return elapsed < config.timeWindowMs() ? config.timeWindowMs() : elapsed; } } {code} So this approach will only underestimate in the first sub-window (rather than all 10 in fixed, or just the first measurement in Elapsed) so, unless your subwindow size is small in relation to the measurement frequency, it should work well for Client throttling. Certainly it appears the best compromise to me. Alternatively we just stick with both approaches. I still think there is a reasonable argument for both. > Replication Throttling: Consolidate Rate Classes > ------------------------------------------------ > > Key: KAFKA-4178 > URL: https://issues.apache.org/jira/browse/KAFKA-4178 > Project: Kafka > Issue Type: Improvement > Components: replication > Affects Versions: 0.10.1.0 > Reporter: Ben Stopford > > Replication throttling is using a different implementation of Rate to client > throttling (Rate & SimpleRate). These should be consolidated so both use the > same approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332)