> On May 13, 2015, 5:14 p.m., Jay Kreps wrote:
> > clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java, line 
> > 62
> > <https://reviews.apache.org/r/34170/diff/1/?file=958215#file958215line62>
> >
> >     Is this actually right? I agree you'll get discontinuities as the 
> > measured time shrinks to zero but why is giving back 0 the right answer? In 
> > the case you guys were testing 0 was safe, but imagine a case where the 
> > monitoring was checking that the value didn't fall below some threshold.
> 
> Dong Lin wrote:
>     The question is, when Rate.measure() is called right after 
> Rate.record(n), what should be the return value? I think there are two 
> possibilities: 0 and n/config.timeWindowMs(). I didn't find any use case 
> where these two values make a difference. 
>     
>     Which value do you think is the best?
>     
>     Thank you.
> 
> Aditya Auradkar wrote:
>     I think returning 0 is reasonable if no time has elapsed (technically).
>     As an alternate solution, what if we assumed that "elapsed" is always 1 
> (at least). For example:
>     
>     double elapsed = convert(now - stat.oldest(now).lastWindowMs) + 1
>     
>     In case of seconds, this basically means that you assume the current 
> second is always complete. This is only a problem (for a couple of seconds) 
> when all previous samples have zero activity or when the server is just 
> starting up.
> 
> Jay Kreps wrote:
>     Actually there is a fundamental issue with the computation that patching 
> around the 0 case doesn't fix. That is the instability of the estimate. This 
> is the "taking a poll with sample size one" problem.
>     
>     Even if you patch the 0 case you still get a bad answer 1 ms later. That 
> is, let's say you get a single 50k request and your quota is 1MB/sec. 
> Currently at 0ms we estimate infinity which is in fact the measured rate but 
> obviously not a good estimate. But even 1 ms later the estimate is bad. 
> 50k*1000ms = ~50MB/sec.
>     
>     This is somewhat rare because it only happens when there is just one 
> sample.
>     
>     They key observation is that if a sample is missing, nothing happened in 
> that time period. But the calculation should still use that time period.
>     
>     So the right way to compute it, I think, is actually
>       ellapsed = (num_samples-1)*window_size + (now - current_sample.begin)
>       
>     For safety I think we should also require the number of samples to be >= 
> 2 and default it to 3.
> 
> Dong Lin wrote:
>     Hi Jay: thanks for comments!
>     
>     I think the problem here is that, when event number is very small (e.g. 
> 0, 1), what value should rate.measure() return, right? If I understand your 
> solution formula right, when there is only one sample, your measured rate is 
> n/ellapsed = n/(now - current_sample.begin), which is exactly same as the 
> current code. I agree that requiring number of samples to be >= 2 solve the 
> problem. But what happens when user call rate.measure() when there is only 1 
> sample?
>     
>     I agree with you that we still get a bad answer if we patch the 0 case. 
> How about we patch the 0 to timeWindowMs case: if (ellapsed < timeWindowMs) 
> then ellapsed = timeWindowMs. Does this solve the problem here?

I think we all agree that there is problem with rate measure when ellapsed time 
is too small. What we need to properly define the rate measure to handle such 
case, which is currently missing.

If we want to require number of samples to be >= 2 as Jay suggested, we can 
also let ellapsed = max(ellapsed, 2*timeWindowMs) and keep the rest of the 
code, right?


- Dong


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34170/#review83629
-----------------------------------------------------------


On May 13, 2015, 10:32 p.m., Dong Lin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34170/
> -----------------------------------------------------------
> 
> (Updated May 13, 2015, 10:32 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2191
>     https://issues.apache.org/jira/browse/KAFKA-2191
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-2191; Measured rate should not be infinite
> 
> 
> Diffs
> -----
> 
>   clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java 
> 98429da34418f7f1deba1b5e44e2e6025212edb3 
>   
> clients/src/main/java/org/apache/kafka/common/metrics/stats/SampledStat.java 
> b341b7daaa10204906d78b812fb05fd27bc69373 
> 
> Diff: https://reviews.apache.org/r/34170/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Dong Lin
> 
>

Reply via email to