[ 
https://issues.apache.org/jira/browse/KAFKA-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512485#comment-15512485
 ] 

Ben Stopford commented on KAFKA-4178:
-------------------------------------

Thanks Joel. Actually this comment, from that thread, makes some sense 
"Basically, we were having issues with very large metric values when the metric 
was very recently created."

My guess is, this difference in requirements comes from the fact that client 
quotas throttles by imposing a delay, so if you overestimate the metric, as is 
possible when using the Elapsed Window method, you could calculate a very long 
delay which might cause a client to time out. Replication throttling doesn't 
have this issue in the same way, as a overestimate will only affect replication 
for as long as the metric is actually overestimated. Which is never more than 
one or two sub windows in practice. But replication throttling does have an 
issues with the Fixed Window approach, as it consistently underestimates for 
the entire first window (i.e. ten sub-windows). 

So if we really want to merge the approaches, I actually implemented another 
type of rate (removed from this PR for simplicity) but I'll bring it up here. 
You can see it in this commit, it's called FixedSubWindowPolicy 
https://github.com/benstopford/kafka/blob/edb51d1d0df04b06a980940f9688a0ab06112784/clients/src/main/java/org/apache/kafka/common/metrics/stats/Window.java

This is essentially a simple hybrid of both approaches. If we really want to 
consolidate on one approach, this hybrid approach would be best I believe. I'll 
replicate it here as it's very simple:

{code:title=Window.java|borderStyle=solid}
    /**
     * This policy fixes the first sub-window. If measurements do not span
     * more than one sub-window then the whole sub-window duration is used
     * to calculate the rate.
     *
     * However if there are measurements spanning multiple sub windows this rate
     * behaves identically to the elapsed window policy.
     *
     * So this provides a slow start, in a similar fashion to FixedWindows,
     * but only over the duration of the first sub-window rather than all
     * sub-windows.
     *
     * This policy policy provides a balance between the other two. It has a 
short
     * "slow start", in comparison to teh Fixed policy, after which it will have
     * the accuracy of the Elapsed policy.
     */
    private static class FixedSubWindowPolicy implements Policy {
        @Override
        public long windowSize(long first, long last, MetricConfig config) {
            long elapsed = last - first;
            return elapsed < config.timeWindowMs() ? config.timeWindowMs() : 
elapsed;
        }
    } 
{code}

So this approach will only underestimate in the first sub-window (rather than 
all 10 in fixed, or just the first measurement in Elapsed) so, unless your 
subwindow size is small in relation to the measurement frequency, it should 
work well for Client throttling. 

Certainly it appears the best compromise to me. Alternatively we just stick 
with both approaches. I still think there is a reasonable argument for both. 

> Replication Throttling: Consolidate Rate Classes
> ------------------------------------------------
>
>                 Key: KAFKA-4178
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4178
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 0.10.1.0
>            Reporter: Ben Stopford
>
> Replication throttling is using a different implementation of Rate to client 
> throttling (Rate & SimpleRate). These should be consolidated so both use the 
> same approach. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to