> On May 13, 2015, 5:14 p.m., Jay Kreps wrote: > > clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java, line > > 62 > > <https://reviews.apache.org/r/34170/diff/1/?file=958215#file958215line62> > > > > Is this actually right? I agree you'll get discontinuities as the > > measured time shrinks to zero but why is giving back 0 the right answer? In > > the case you guys were testing 0 was safe, but imagine a case where the > > monitoring was checking that the value didn't fall below some threshold. > > Dong Lin wrote: > The question is, when Rate.measure() is called right after > Rate.record(n), what should be the return value? I think there are two > possibilities: 0 and n/config.timeWindowMs(). I didn't find any use case > where these two values make a difference. > > Which value do you think is the best? > > Thank you. > > Aditya Auradkar wrote: > I think returning 0 is reasonable if no time has elapsed (technically). > As an alternate solution, what if we assumed that "elapsed" is always 1 > (at least). For example: > > double elapsed = convert(now - stat.oldest(now).lastWindowMs) + 1 > > In case of seconds, this basically means that you assume the current > second is always complete. This is only a problem (for a couple of seconds) > when all previous samples have zero activity or when the server is just > starting up. > > Jay Kreps wrote: > Actually there is a fundamental issue with the computation that patching > around the 0 case doesn't fix. That is the instability of the estimate. This > is the "taking a poll with sample size one" problem. > > Even if you patch the 0 case you still get a bad answer 1 ms later. That > is, let's say you get a single 50k request and your quota is 1MB/sec. > Currently at 0ms we estimate infinity which is in fact the measured rate but > obviously not a good estimate. But even 1 ms later the estimate is bad. > 50k*1000ms = ~50MB/sec. > > This is somewhat rare because it only happens when there is just one > sample. > > They key observation is that if a sample is missing, nothing happened in > that time period. But the calculation should still use that time period. > > So the right way to compute it, I think, is actually > ellapsed = (num_samples-1)*window_size + (now - current_sample.begin) > > For safety I think we should also require the number of samples to be >= > 2 and default it to 3. > > Dong Lin wrote: > Hi Jay: thanks for comments! > > I think the problem here is that, when event number is very small (e.g. > 0, 1), what value should rate.measure() return, right? If I understand your > solution formula right, when there is only one sample, your measured rate is > n/ellapsed = n/(now - current_sample.begin), which is exactly same as the > current code. I agree that requiring number of samples to be >= 2 solve the > problem. But what happens when user call rate.measure() when there is only 1 > sample? > > I agree with you that we still get a bad answer if we patch the 0 case. > How about we patch the 0 to timeWindowMs case: if (ellapsed < timeWindowMs) > then ellapsed = timeWindowMs. Does this solve the problem here? > > Dong Lin wrote: > I think we all agree that there is problem with rate measure when > ellapsed time is too small. What we need to properly define the rate measure > to handle such case, which is currently missing. > > If we want to require number of samples to be >= 2 as Jay suggested, we > can also let ellapsed = max(ellapsed, 2*timeWindowMs) and keep the rest of > the code, right?
If we require num samples greater than 2, we shoud simply change this condition in MetricConfig: if (samples < 1) throw new IllegalArgumentException("The number of samples must be at least 1."); Jay's equation is now similar to Dongs. If num samples = 3, ellapsed = (num_samples-1)*window_size + (now - current_sample.begin) ellapsed = 2*window_size + (now - current_sample.begin) - Aditya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34170/#review83629 ----------------------------------------------------------- On May 14, 2015, 7:34 a.m., Dong Lin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34170/ > ----------------------------------------------------------- > > (Updated May 14, 2015, 7:34 a.m.) > > > Review request for kafka. > > > Bugs: KAFKA-2191 > https://issues.apache.org/jira/browse/KAFKA-2191 > > > Repository: kafka > > > Description > ------- > > KAFKA-2191; Measured rate should not be infinite > > > Diffs > ----- > > clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java > 98429da34418f7f1deba1b5e44e2e6025212edb3 > > clients/src/main/java/org/apache/kafka/common/metrics/stats/SampledStat.java > b341b7daaa10204906d78b812fb05fd27bc69373 > > Diff: https://reviews.apache.org/r/34170/diff/ > > > Testing > ------- > > > Thanks, > > Dong Lin > >