Re: Review Request 34170: Patch for KAFKA-2191

Dong Lin Fri, 15 May 2015 13:29:48 -0700


> On May 13, 2015, 11:50 p.m., Aditya Auradkar wrote:
> > clients/src/main/java/org/apache/kafka/common/metrics/stats/SampledStat.java,
> >  line 45
> > <https://reviews.apache.org/r/34170/diff/2/?file=958605#file958605line45>
> >
> >     I think this is a good catch.
> >     
> >     Just so I understand, this is adding new samples for blank intervals of 
> > time when record was not called. As an optimization, I think you should 
> > only add "MetricConfig.samples" number of samples. In the rare case, there 
> > is no activity for 10 mins (say), this will add 10sec*6*10 = 600 samples 
> > which will be purged immediately on the next record call.
> 
> Jay Kreps wrote:
>     This is a really good point. It is totally possible for a metric to track 
> activity on a topic that has no writes for a month, the first write would 
> then cause you to cycle through a month of samples. The logic around 
> correctly skipping these windows and calculating the correct window boundry 
> needs to be carefully worked out.
> 
> Jay Kreps wrote:
>     Actually maybe you can elaborate on why this is needed at all? In the 
> current code if the current sample is complete we add a new sample. Your code 
> adds lots of samples. But why do we need that? Isn't purging obsolete samples 
> handled on measurement already? I think that is more elegant. You must see 
> some issue there, maybe you can explain?
> 
> Dong Lin wrote:
>     Adi: Good catch! I will update the patch to fix the problem.
>     
>     Jay: Sure. Here is the problem I find with the current code.
>     
>     - The current code implements SampledStat.advance in a way that is 
> probably different from users' expectation: typically when rate is sampled, 
> time will be divided into consecutive slots of equal length and samples are 
> expired in unit of these time slots, which assures the user the rate is 
> measured on samples collected in the past expireAge period. The current 
> implementation doesn't provide such an easy-to-understand interpretations, 
> since the effective sampled period can be anywhere between 0 and expireAge.
>     
>     - As one extreme example, say we call rate.measure after expireAge (i.e. 
> config.samples() * config.timeWindowMs()) has passed since last sample. Then 
> purgeObsoleteSamples will reset all samples and elapsed will be 0 as a 
> result. This is problematic: rate should instead keep the information that we 
> haven’t observed any sample during the past expireAge.
>     
>     Does it make sense?
> 
> Jay Kreps wrote:
>     Hey Dong, I think what you are saying is that the last sample is partial, 
> that is it doesn't cover a full window yet. This is true and is by design. 
> That is the whole reason there are multiple samples--to stabalize the partial 
> estimate. The ideal window would be a backwards looking sliding window of N 
> ms from the time of measurement but that would be computationally unfeasible.
>     
>     I don't understand your example. If no events have occurred in 
> config.samples() * config.timeWindowMs() then the observed rate is 0, right? 
> I think you are saying that the time estimate should be the full window, but 
> that is fixed by changing the other formula as I suggested.

Hey Jay, thanks for reply. The problem I wanted to explain with this example is 
that, if no events have occurred in config.samples() * config.timeWindowMs(), 
oldest == now and the measured rate will be infinite (which should be 0 
instead).

The example explained above is only relevant to the current code. As you said, 
using the other formula you suggested will fix the problem and avoid the need 
to make this change here.

Just a couple of questions towards an acceptable patch. How about we use the 
other formula you suggested in Rate.measure()? When the user calls 
rate.measure() with number of samples <= 1 (i.e. ellapsed <= timeWindowMs), 
should we throw exception or return n/timeWindowMs? And should we keep the 
existing SampledStat.oldest()?

Thank much,

- Dong

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34170/#review83687
-----------------------------------------------------------

On May 14, 2015, 7:34 a.m., Dong Lin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34170/
> -----------------------------------------------------------
> 
> (Updated May 14, 2015, 7:34 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2191
>     https://issues.apache.org/jira/browse/KAFKA-2191
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-2191; Measured rate should not be infinite
> 
> 
> Diffs
> -----
> 
>   clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java 
> 98429da34418f7f1deba1b5e44e2e6025212edb3 
>   
> clients/src/main/java/org/apache/kafka/common/metrics/stats/SampledStat.java 
> b341b7daaa10204906d78b812fb05fd27bc69373 
> 
> Diff: https://reviews.apache.org/r/34170/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Dong Lin
> 
>

Re: Review Request 34170: Patch for KAFKA-2191

Reply via email to