Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

Bill Bejeck Tue, 16 Jul 2019 14:23:40 -0700

Hi John,

Thanks for the updates.


I like RunningCount and RunningSum.

What about WindowedCount, WindowedSum instead of Moving?
I'm just throwing this out there as Windowed seems more intuitive to me,
but I'm not married to the idea.

-Bill

On Tue, Jul 16, 2019 at 5:09 PM John Roesler <j...@confluent.io> wrote:

> No worries! Choosing good public API names is a high-impact design
> activity.
>
> Matthias, Bruno, Bill, and Stanislav,
>
> You've all contributed to this discussion or the vote so far... How do
> you feel about the proposed name change:
>
> MovingCount, MovingSum   (instead of Sampled)
> RunningCount, RunningSum   (Instead of Total)
>
> Thanks,
> -John
>
> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan <ryannedo...@gmail.com>
> wrote:
> >
> > John, that makes sense to me. Sorry for the bikeshedding.
> >
> > Ryanne
> >
> > On Tue, Jul 16, 2019 at 12:49 PM John Roesler <j...@confluent.io> wrote:
> >
> > > Thanks for the explanation and the suggestion, Ryanne,
> > >
> > > I went with "sampled" just because these are instances of SampledStat,
> > > which in the Kafka Metrics ecosystem are computed from a window of
> > > recent samples. Thinking more about it, the fact that they are sampled
> > > and the fact that they are windowed are orthogonal, which is what
> > > you're pointing out... sampling by itself doesn't indicate that it's a
> > > moving average.
> > >
> > > Since there is no way in Kafka Metrics for a metric to be sampled and
> > > not windowed/moving/decaying, calling them Sampled would never be
> > > incorrect. But to someone unfamiliar with the code, it wouldn't
> > > immediately suggest the behavior of the metric that actually matters.
> > > That is, the behavior that distinguishes the two classes of metrics we
> > > want to disambiguate here.
> > >
> > > It sounds like you'd suggest a new matrix of names:
> > > MovingCount, MovingSum
> > > RunningCount, RunningSum
> > >
> > > Are these names unambiguous and self explanatory?
> > >
> > > Thanks,
> > > -John
> > >
> > > On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan <ryannedo...@gmail.com>
> > > wrote:
> > > >
> > > > > measurements, which decay/expire over time
> > > >
> > > > Thanks John for the clarification. This was my (re-)reading of the
> code,
> > > > but this is not what I think of when I hear "sampled". In fact,
> you'll
> > > > notice that the Wikipedia pages for "Sample (statistics)" and "Sample
> > > > (signal processing)" do not contain the words decay, expire, recent,
> > > > history, or anything similar.
> > > >
> > > > Similar to "running", I'd suggest the more correct "moving", as in
> > > "moving
> > > > average" and "moving sum", which involve looking back N samples,
> > > applying a
> > > > sliding window, decaying over time etc.
> > > >
> > > > Ryanne
> > > >
> > > > On Tue, Jul 16, 2019, 11:58 AM John Roesler <j...@confluent.io>
> wrote:
> > > >
> > > > > Thanks for raising this concern, Ryanne,
> > > > >
> > > > > "Sampled" indicates that the metrics is sampled, namely that we
> > > > > maintain a set of samples from recent value measurements, which
> > > > > decay/expire over time. So, the metric value is only
> representative of
> > > > > the recent past.
> > > > >
> > > > > "Total" indicates that the metric value contains all the
> information
> > > > > from the creation of the metric. For example., the total sum would
> > > > > include all measurements since the app started up.
> > > > >
> > > > > It seems like your concern is that the word "total" doesn't really
> > > > > pinpoint this meaning, which is true. It's especially confusing
> that
> > > > > another meaning of "total" is synonymous with "sum", rendering the
> > > > > name "TotalSum" sort of absurd.
> > > > >
> > > > > We previously considered "cumulative", which was rejected as a
> > > > > mouthful (it's four syllables) .
> > > > >
> > > > > You mentioned "running", which might be a more appropriate modifier
> > > > > (RunningSum and RunningCount).
> > > > >
> > > > > What would everyone think about that?
> > > > >
> > > > > Thanks,
> > > > > -John
> > > > >
> > > > > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan <
> ryannedo...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > John, I mentioned on the VOTE thread that the proposed names are
> a
> > > bit
> > > > > > confusing,
> > > > > >
> > > > > > > given that "sum", "total", and "count" are roughly
> synonymous...
> > > > > >
> > > > > > In particular, TotalSum is, I think, a "running total", though
> the
> > > naming
> > > > > > doesn't really capture that.
> > > > > >
> > > > > > I think to avoid confusion, we should define exactly what
> "total" and
> > > > > > "sampled" are supposed to indicate, and perhaps pick appropriate
> > > naming
> > > > > > from there.
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 12, 2019 at 1:42 PM John Roesler <j...@confluent.io>
> > > wrote:
> > > > > >
> > > > > > > Hey, thanks Matthias and Bruno,
> > > > > > >
> > > > > > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to
> me.
> > > > > > >
> > > > > > > Also, yes, I would have liked to not have any modifier for
> > > > > > > "non-sampled", but there is a name conflict with Sum.
> > > > > > >
> > > > > > > I'll update the KIP to reflect "TotalX" and then start the vote
> > > thread.
> > > > > > >
> > > > > > > Thanks again,
> > > > > > > -John
> > > > > > >
> > > > > > > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna <
> br...@confluent.io
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > OK, makes sense. Then, I am in favour of TotalCount and
> TotalSum.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Bruno
> > > > > > > >
> > > > > > > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax <
> > > > > matth...@confluent.io>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > `Sum` is an existing name, for the "sampled sum" metric,
> that
> > > gets
> > > > > > > > > deprecated. Hence, we cannot use it.
> > > > > > > > >
> > > > > > > > > If we cannot use `Sum` and use `TotalSum`, we should also
> not
> > > use
> > > > > > > > > `Count` but `TotalCount` for consistency.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -Matthias
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> > > > > > > > > > Hi John,
> > > > > > > > > >
> > > > > > > > > > Thank you for the KIP.
> > > > > > > > > >
> > > > > > > > > > LGTM
> > > > > > > > > >
> > > > > > > > > > I also do not like CumulativeSum/Count so much. I
> propose to
> > > just
> > > > > > > call
> > > > > > > > > > it Sum and Count.
> > > > > > > > > >
> > > > > > > > > > I understand that you want to unequivocally distinguish
> the
> > > two
> > > > > > > metric
> > > > > > > > > > functions by their names, but I have the feeling the
> names
> > > become
> > > > > > > > > > artificially complex. The exact semantics can also be
> > > documented
> > > > > in
> > > > > > > > > > the javadocs, which btw could also be improved in those
> > > classes.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Bruno
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax <
> > > > > > > matth...@confluent.io> wrote:
> > > > > > > > > >>
> > > > > > > > > >> Thanks for the KIP. Overall LGTM.
> > > > > > > > > >>
> > > > > > > > > >> The only though I have is, if we may want to use
> `TotalSum`
> > > and
> > > > > > > > > >> `TotalCount` instead of `CumulativeSum/Count` as names?
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> -Matthias
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On 7/11/19 9:31 AM, John Roesler wrote:
> > > > > > > > > >>> Hi Kafka devs,
> > > > > > > > > >>>
> > > > > > > > > >>> I'd like to propose KIP-488 as a minor cleanup of some
> of
> > > our
> > > > > > > metric
> > > > > > > > > >>> implementations.
> > > > > > > > > >>>
> > > > > > > > > >>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > > > > > > > > >>>
> > > > > > > > > >>> Over time, iterative updates to these metrics has
> resulted
> > > in a
> > > > > > > pretty
> > > > > > > > > >>> confusing little collection of classes, and I've
> personally
> > > > > been
> > > > > > > > > >>> involved in three separate moderately time-consuming
> > > > > iterations of
> > > > > > > me
> > > > > > > > > >>> or someone else trying to work out which metrics are
> > > > > available, and
> > > > > > > > > >>> which ones are desired for a given use case. One of
> these
> > > was
> > > > > > > actually
> > > > > > > > > >>> a long-running bug in Kafka Streams' metrics, so not
> only
> > > has
> > > > > this
> > > > > > > > > >>> confusion been a time sink, but it has also led to
> bugs.
> > > > > > > > > >>>
> > > > > > > > > >>> I'm hoping this change won't be too controversial.
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks,
> > > > > > > > > >>> -John
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
>

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

Reply via email to