Thanks for the background John. I am happy with `Windowed` and `Cumulative`, too.
-Matthias On 7/17/19 11:22 AM, Ryanne Dolan wrote: > John, makes sense to me! Thanks. > > Ryanne > > On Wed, Jul 17, 2019, 1:16 PM John Roesler <j...@confluent.io> wrote: > >> Agreed. I think the names are actually not ambiguous once you recall >> that the stats summarize measurements and each measurement is a >> floating point number, but there's enough overlap that I also was >> initially confused as well. I do plan to make this super clear in the >> documentation. >> >> Thanks, >> -John >> >> On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman <sop...@confluent.io> >> wrote: >>> >>> Sounds good to me >>> >>> By the way, while I agree that we can't really do better than Sum and >> Count >>> I will say I also found the distinction a bit unclear at first glance. We >>> should at least document clearly that "Sum" is a "sum of values" whereas >>> "Count" is a "number of things" -- but that doesn't need to be part of >> the >>> KIP >>> >>> On Wed, Jul 17, 2019 at 11:00 AM John Roesler <j...@confluent.io> wrote: >>> >>>> Thanks for the replies. >>>> >>>> I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or >>>> something, it should still be pretty obvious that WindowedX is the >>>> unweighted version? In that case, I buy the argument that we don't >>>> need "Simple" and we can just go with: >>>> >>>> WindowedSum, WindowedCount >>>> CumulativeSum, CumulativeCount >>>> >>>> Sound good? >>>> Thanks, >>>> -John >>>> >>>> On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman >>>> <sop...@confluent.io> wrote: >>>>> >>>>> Thanks for the crash course in statistical terms :) >>>>> >>>>> In light of this I definitely support Cumulative{Sum,Count}, but I'm >>>> really >>>>> not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so >> much >>>>> because of its unfortunate length (although that is unfortunate it >>>>> shouldn't be a deciding factor) but because it seems to have the >>>> potential >>>>> to confuse further. I'm not sure what we gain by adding "Simple" >> since to >>>>> me at least, the unweighted-ness is obvious and the definition of >> simple >>>> is >>>>> not. To those who haven't been exposed to the finer details of >>>> statistical >>>>> definitions, I think they are more likely to read "SimpleXX" and >> wonder >>>> "is >>>>> there an 'advanced' or non-simple kind of Windowed?" than they are to >>>>> wonder what is the weighting behind these metrics. >>>>> >>>>> Sophie >>>>> >>>>> On Wed, Jul 17, 2019 at 8:18 AM John Roesler <j...@confluent.io> >> wrote: >>>>> >>>>>> Thanks for the discussion, all. >>>>>> >>>>>> I've done a little more research into the statistical terminology. >>>>>> Matthias is correct, "running" and "moving" appear to be synonyms. >>>>>> Unfortunately, both can be computed either over a window of the >> last N >>>>>> measurements or over all prior measurements. "Moving" just >> signifies >>>>>> that the statistic is computed over a "live" data set, i.e., a >>>>>> continuous stream of measurements, and the expectation is that the >>>>>> stat would be updated in response to new measurements. >>>>>> >>>>>> I found https://en.wikipedia.org/wiki/Moving_average to have a >> pretty >>>>>> good overview of the whole picture. >>>>>> >>>>>> After considering the discussion so far and some light reading, it >>>>>> seems like "Cumulative" is truly the correct term for the all-time >>>>>> metrics: >>>>>> >>>>>>> In a cumulative moving average, the data arrive in an >>>>>>> ordered datum stream, and the user would like to get >>>>>>> the average of all of the data up until the current datum >>>>>>> point. >>>>>> >>>>>> I know that we previously felt that "cumulative" was too much of a >>>>>> mouthful, but it seems like our quest for a terser term led us >> into a >>>>>> briar patch. Also, now there is an independent source (the wiki >> page) >>>>>> indicating that this is indeed the correct term, and it doesn't >> offer >>>>>> any synonyms to choose from. Maybe we can take comfort in the fact >>>>>> that we'll rarely be saying the name of the classes out loud. >>>>>> >>>>>> As far as moving stats that operate over a window of the last N >>>>>> measurements, there are multiple options, including Simple >>>>>> (unweighted), Weighted, and Exponentially Weighted, and presumably >>>>>> infinite variations with other weighting functions. In our domain, >>>>>> there is only one weighting function available, but it's still more >>>>>> self-documenting and future-proof to specify the type of windowed >>>>>> statistic. Therefore, I'm proposing "Simple" as the term for the >>>>>> windowed (aka sampled) stats, while keeping Windowed in the name to >>>>>> distinguish it from the all-time metrics. >>>>>> >>>>>>> In financial applications a simple moving average (SMA) >>>>>>> is the unweighted mean of the previous n data. >>>>>> >>>>>> Therefore, we would have the proposed matrix: >>>>>> >>>>>> SimpleWindowedSum, SimpleWindowedCount >>>>>> CumulativeSum, CumulativeCount >>>>>> >>>>>> Again, all these proposed names are less pithy than we might wish, >> but >>>>>> the whole point of this exercise is to demystify and disambiguate >>>>>> them. It seems like the discussion so far illustrates the futility >> of >>>>>> trying to find names that are both short and descriptive. >>>>>> >>>>>> How does that sound? >>>>>> -John >>>>>> >>>>>> On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax < >> matth...@confluent.io >>>>> >>>>>> wrote: >>>>>>> >>>>>>> It's a fair point that Ryanne raises. However, "running sum" is >> the >>>> same >>>>>>> as "moving sum" from my understanding. >>>>>>> >>>>>>> The issue is still, that `Sum` and `Count` which seem to be the >>>> cleanest >>>>>>> names cannot be used. While I agree that `TotalSum` and >> `TotalCount` >>>> is >>>>>>> somewhat redundant, I still think it the best suggestion so far. >>>>>>> >>>>>>> For the "sampled" version, I am personally fine with either >>>> `MovingXxx`, >>>>>>> `WindowedXxx`, or `RunningXxx` -- to me, that are all equally >> good to >>>>>>> describe the semantics. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote: >>>>>>>> I'm +1 on Windowed, was about to suggest that as I was >> catching up >>>> on >>>>>> the >>>>>>>> discussion but Bill beat me to it :) >>>>>>>> >>>>>>>> On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck <bbej...@gmail.com >>> >>>> wrote: >>>>>>>> >>>>>>>>> Hi John, >>>>>>>>> >>>>>>>>> Thanks for the updates. >>>>>>>>> >>>>>>>>> I like RunningCount and RunningSum. >>>>>>>>> >>>>>>>>> What about WindowedCount, WindowedSum instead of Moving? >>>>>>>>> I'm just throwing this out there as Windowed seems more >> intuitive >>>> to >>>>>> me, >>>>>>>>> but I'm not married to the idea. >>>>>>>>> >>>>>>>>> -Bill >>>>>>>>> >>>>>>>>> On Tue, Jul 16, 2019 at 5:09 PM John Roesler < >> j...@confluent.io> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> No worries! Choosing good public API names is a high-impact >>>> design >>>>>>>>>> activity. >>>>>>>>>> >>>>>>>>>> Matthias, Bruno, Bill, and Stanislav, >>>>>>>>>> >>>>>>>>>> You've all contributed to this discussion or the vote so >> far... >>>> How >>>>>> do >>>>>>>>>> you feel about the proposed name change: >>>>>>>>>> >>>>>>>>>> MovingCount, MovingSum (instead of Sampled) >>>>>>>>>> RunningCount, RunningSum (Instead of Total) >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -John >>>>>>>>>> >>>>>>>>>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan < >>>> ryannedo...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> John, that makes sense to me. Sorry for the bikeshedding. >>>>>>>>>>> >>>>>>>>>>> Ryanne >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 16, 2019 at 12:49 PM John Roesler < >>>> j...@confluent.io> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks for the explanation and the suggestion, Ryanne, >>>>>>>>>>>> >>>>>>>>>>>> I went with "sampled" just because these are instances of >>>>>>>>> SampledStat, >>>>>>>>>>>> which in the Kafka Metrics ecosystem are computed from a >>>> window of >>>>>>>>>>>> recent samples. Thinking more about it, the fact that they >> are >>>>>>>>> sampled >>>>>>>>>>>> and the fact that they are windowed are orthogonal, which >> is >>>> what >>>>>>>>>>>> you're pointing out... sampling by itself doesn't indicate >> that >>>>>> it's >>>>>>>>> a >>>>>>>>>>>> moving average. >>>>>>>>>>>> >>>>>>>>>>>> Since there is no way in Kafka Metrics for a metric to be >>>> sampled >>>>>> and >>>>>>>>>>>> not windowed/moving/decaying, calling them Sampled would >> never >>>> be >>>>>>>>>>>> incorrect. But to someone unfamiliar with the code, it >> wouldn't >>>>>>>>>>>> immediately suggest the behavior of the metric that >> actually >>>>>> matters. >>>>>>>>>>>> That is, the behavior that distinguishes the two classes of >>>> metrics >>>>>>>>> we >>>>>>>>>>>> want to disambiguate here. >>>>>>>>>>>> >>>>>>>>>>>> It sounds like you'd suggest a new matrix of names: >>>>>>>>>>>> MovingCount, MovingSum >>>>>>>>>>>> RunningCount, RunningSum >>>>>>>>>>>> >>>>>>>>>>>> Are these names unambiguous and self explanatory? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> -John >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan < >>>>>> ryannedo...@gmail.com >>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> measurements, which decay/expire over time >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks John for the clarification. This was my >> (re-)reading >>>> of the >>>>>>>>>> code, >>>>>>>>>>>>> but this is not what I think of when I hear "sampled". In >>>> fact, >>>>>>>>>> you'll >>>>>>>>>>>>> notice that the Wikipedia pages for "Sample (statistics)" >> and >>>>>>>>> "Sample >>>>>>>>>>>>> (signal processing)" do not contain the words decay, >> expire, >>>>>>>>> recent, >>>>>>>>>>>>> history, or anything similar. >>>>>>>>>>>>> >>>>>>>>>>>>> Similar to "running", I'd suggest the more correct >> "moving", >>>> as in >>>>>>>>>>>> "moving >>>>>>>>>>>>> average" and "moving sum", which involve looking back N >>>> samples, >>>>>>>>>>>> applying a >>>>>>>>>>>>> sliding window, decaying over time etc. >>>>>>>>>>>>> >>>>>>>>>>>>> Ryanne >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 16, 2019, 11:58 AM John Roesler < >>>> j...@confluent.io> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for raising this concern, Ryanne, >>>>>>>>>>>>>> >>>>>>>>>>>>>> "Sampled" indicates that the metrics is sampled, namely >> that >>>> we >>>>>>>>>>>>>> maintain a set of samples from recent value measurements, >>>> which >>>>>>>>>>>>>> decay/expire over time. So, the metric value is only >>>>>>>>>> representative of >>>>>>>>>>>>>> the recent past. >>>>>>>>>>>>>> >>>>>>>>>>>>>> "Total" indicates that the metric value contains all the >>>>>>>>>> information >>>>>>>>>>>>>> from the creation of the metric. For example., the total >> sum >>>>>>>>> would >>>>>>>>>>>>>> include all measurements since the app started up. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It seems like your concern is that the word "total" >> doesn't >>>>>>>>> really >>>>>>>>>>>>>> pinpoint this meaning, which is true. It's especially >>>> confusing >>>>>>>>>> that >>>>>>>>>>>>>> another meaning of "total" is synonymous with "sum", >>>> rendering >>>>>>>>> the >>>>>>>>>>>>>> name "TotalSum" sort of absurd. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We previously considered "cumulative", which was >> rejected as >>>> a >>>>>>>>>>>>>> mouthful (it's four syllables) . >>>>>>>>>>>>>> >>>>>>>>>>>>>> You mentioned "running", which might be a more >> appropriate >>>>>>>>> modifier >>>>>>>>>>>>>> (RunningSum and RunningCount). >>>>>>>>>>>>>> >>>>>>>>>>>>>> What would everyone think about that? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> -John >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan < >>>>>>>>>> ryannedo...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> John, I mentioned on the VOTE thread that the proposed >> names >>>>>>>>> are >>>>>>>>>> a >>>>>>>>>>>> bit >>>>>>>>>>>>>>> confusing, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> given that "sum", "total", and "count" are roughly >>>>>>>>>> synonymous... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In particular, TotalSum is, I think, a "running total", >>>> though >>>>>>>>>> the >>>>>>>>>>>> naming >>>>>>>>>>>>>>> doesn't really capture that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think to avoid confusion, we should define exactly >> what >>>>>>>>>> "total" and >>>>>>>>>>>>>>> "sampled" are supposed to indicate, and perhaps pick >>>>>>>>> appropriate >>>>>>>>>>>> naming >>>>>>>>>>>>>>> from there. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ryanne >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jul 12, 2019 at 1:42 PM John Roesler < >>>>>>>>> j...@confluent.io> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey, thanks Matthias and Bruno, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree, "Cumulative" is a mouthful. "TotalX" sounds >> fine >>>> to >>>>>>>>>> me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, yes, I would have liked to not have any modifier >> for >>>>>>>>>>>>>>>> "non-sampled", but there is a name conflict with Sum. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'll update the KIP to reflect "TotalX" and then start >> the >>>>>>>>> vote >>>>>>>>>>>> thread. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks again, >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna < >>>>>>>>>> br...@confluent.io >>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> OK, makes sense. Then, I am in favour of TotalCount >> and >>>>>>>>>> TotalSum. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Bruno >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax < >>>>>>>>>>>>>> matth...@confluent.io> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> `Sum` is an existing name, for the "sampled sum" >> metric, >>>>>>>>>> that >>>>>>>>>>>> gets >>>>>>>>>>>>>>>>>> deprecated. Hence, we cannot use it. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If we cannot use `Sum` and use `TotalSum`, we should >> also >>>>>>>>>> not >>>>>>>>>>>> use >>>>>>>>>>>>>>>>>> `Count` but `TotalCount` for consistency. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 7/11/19 12:58 PM, Bruno Cadonna wrote: >>>>>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you for the KIP. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> LGTM >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I also do not like CumulativeSum/Count so much. I >>>>>>>>>> propose to >>>>>>>>>>>> just >>>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>>>>> it Sum and Count. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I understand that you want to unequivocally >> distinguish >>>>>>>>>> the >>>>>>>>>>>> two >>>>>>>>>>>>>>>> metric >>>>>>>>>>>>>>>>>>> functions by their names, but I have the feeling the >>>>>>>>>> names >>>>>>>>>>>> become >>>>>>>>>>>>>>>>>>> artificially complex. The exact semantics can also >> be >>>>>>>>>>>> documented >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> the javadocs, which btw could also be improved in >> those >>>>>>>>>>>> classes. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Bruno >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax < >>>>>>>>>>>>>>>> matth...@confluent.io> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP. Overall LGTM. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The only though I have is, if we may want to use >>>>>>>>>> `TotalSum` >>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> `TotalCount` instead of `CumulativeSum/Count` as >>>>>>>>> names? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 7/11/19 9:31 AM, John Roesler wrote: >>>>>>>>>>>>>>>>>>>>> Hi Kafka devs, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'd like to propose KIP-488 as a minor cleanup of >>>>>>>>> some >>>>>>>>>> of >>>>>>>>>>>> our >>>>>>>>>>>>>>>> metric >>>>>>>>>>>>>>>>>>>>> implementations. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> KIP-488: >>>>>>>>> https://cwiki.apache.org/confluence/x/kkAyBw >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Over time, iterative updates to these metrics has >>>>>>>>>> resulted >>>>>>>>>>>> in a >>>>>>>>>>>>>>>> pretty >>>>>>>>>>>>>>>>>>>>> confusing little collection of classes, and I've >>>>>>>>>> personally >>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>>>>>> involved in three separate moderately >> time-consuming >>>>>>>>>>>>>> iterations of >>>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>>>>>>> or someone else trying to work out which metrics >> are >>>>>>>>>>>>>> available, and >>>>>>>>>>>>>>>>>>>>> which ones are desired for a given use case. One >> of >>>>>>>>>> these >>>>>>>>>>>> was >>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>>>>>> a long-running bug in Kafka Streams' metrics, so >> not >>>>>>>>>> only >>>>>>>>>>>> has >>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>> confusion been a time sink, but it has also led to >>>>>>>>>> bugs. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'm hoping this change won't be too controversial. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >> >
signature.asc
Description: OpenPGP digital signature