Re: [DISCUSS] KIP-714: Client metrics and observability

Jun Rao Wed, 01 Jun 2022 09:55:50 -0700

Hi, Magnus,

51. Just to clarify my question.  (1) Are standard metrics required for
every client for this KIP to function?  (2) Are we converting existing java
metrics to the standard metrics and deprecating the old ones? If so, could
we list all existing java metrics that need to be renamed and the
corresponding new name?


Thanks,

Jun

On Tue, May 31, 2022 at 3:29 PM Jun Rao <j...@confluent.io> wrote:

> Hi, Magnus,
>
> Thanks for the reply.
>
> 51. I think it's fine to have a list of recommended metrics for every
> client to implement. I am just not sure that standardizing on the metric
> names across all clients is practical. The list of common metrics in the
> KIP have completely different names from the java metric names. Some of
> them have different types. For example, some of the common metrics have a
> type of histogram, but the java client metrics don't use histogram in
> general. Requiring the operator to translate those names and understand the
> subtle differences across clients seem to cause more confusion during
> troubleshooting.
>
> Thanks,
>
> Jun
>
> On Tue, May 31, 2022 at 5:02 AM Magnus Edenhill <mag...@edenhill.se>
> wrote:
>
>> Den fre 20 maj 2022 kl 01:23 skrev Jun Rao <j...@confluent.io.invalid>:
>>
>> > Hi, Magus,
>> >
>> > Thanks for the reply.
>> >
>> > 50. Sounds good.
>> >
>> > 51. I miss-understood the proposal in the KIP then. The proposal is to
>> > define a set of common metric names that every client should implement.
>> The
>> > problem is that every client already has its own set of metrics with its
>> > own names. I am not sure that we could easily agree upon a common set of
>> > metrics that work with all clients. There are likely to be some metrics
>> > that are client specific. Translating between the common name and client
>> > specific name is probably going to add more confusion. As mentioned in
>> the
>> > KIP, similar metrics from different clients could have subtle
>> > semantic differences. Could we just let each client use its own set of
>> > metric names?
>> >
>>
>> We identified a common set of metrics that should be relevant for most
>> client implementations,
>> they're the ones listed in the KIP.
>> A supporting client does not have to implement all those metrics, only the
>> ones that makes sense
>> based on that client implementation, and a client may implement other
>> metrics that are not listed
>> in the KIP under its own namespace.
>> This approach has two benefits:
>>  - there will be a common set of metrics that most/all clients implement,
>> which makes monitoring
>>   and troubleshooting easier across fleets with multiple Kafka client
>> languages/implementations.
>>  - client-specific metrics are still possible, so if there is no suitable
>> standard metric a client can still
>>    provide what special metrics it has.
>>
>>
>> Thanks,
>> Magnus
>>
>>
>> On Thu, May 19, 2022 at 10:39 AM Magnus Edenhill <mag...@edenhill.se>
>> wrote:
>> >
>> > > Den ons 18 maj 2022 kl 19:57 skrev Jun Rao <j...@confluent.io.invalid
>> >:
>> > >
>> > > > Hi, Magnus,
>> > > >
>> > >
>> > > Hi Jun
>> > >
>> > >
>> > > >
>> > > > Thanks for the updated KIP. Just a couple of more comments.
>> > > >
>> > > > 50. To troubleshoot a particular client issue, I imagine that the
>> > client
>> > > > needs to identify its client_instance_id. How does the client find
>> this
>> > > > out? Do we plan to include client_instance_id in the client log,
>> expose
>> > > it
>> > > > as a metric or something else?
>> > > >
>> > >
>> > > The KIP suggests that client implementations emit an informative log
>> > > message
>> > > with the assigned client-instance-id once it is retrieved (once per
>> > client
>> > > instance lifetime).
>> > > There's also a clientInstanceId() method that an application can use
>> to
>> > > retrieve
>> > > the client instance id and emit through whatever side channels makes
>> > sense.
>> > >
>> > >
>> > >
>> > > > 51. The KIP lists a bunch of metrics that need to be collected at
>> the
>> > > > client side. However, it seems quite a few useful java client
>> metrics
>> > > like
>> > > > the following are missing.
>> > > >     buffer-total-bytes
>> > > >     buffer-available-bytes
>> > > >
>> > >
>> > > These are covered by client.producer.record.queue.bytes and
>> > > client.producer.record.queue.max.bytes.
>> > >
>> > >
>> > > >     bufferpool-wait-time
>> > > >
>> > >
>> > > Missing, but somewhat implementation specific.
>> > > If it was up to me we would add this later if there's a need.
>> > >
>> > >
>> > >
>> > > >     batch-size-avg
>> > > >     batch-size-max
>> > > >
>> > >
>> > > These are missing and would be suitably represented as a histogram.
>> I'll
>> > > add them.
>> > >
>> > >
>> > >
>> > > >     io-wait-ratio
>> > > >     io-ratio
>> > > >
>> > >
>> > > There's client.io.wait.time which should cover io-wait-ratio.
>> > > We could add a client.io.time as well, now or in a later KIP.
>> > >
>> > > Thanks,
>> > > Magnus
>> > >
>> > >
>> > >
>> > >
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > > On Mon, Apr 4, 2022 at 10:01 AM Jun Rao <j...@confluent.io> wrote:
>> > > >
>> > > > > Hi, Xavier,
>> > > > >
>> > > > > Thanks for the reply.
>> > > > >
>> > > > > 28. It does seem that we have started using KafkaMetrics on the
>> > broker
>> > > > > side. Then, my only concern is on the usage of Histogram in
>> > > KafkaMetrics.
>> > > > > Histogram in KafkaMetrics statically divides the value space into
>> a
>> > > fixed
>> > > > > number of buckets and only returns values on the bucket boundary.
>> So,
>> > > the
>> > > > > returned histogram value may never show up in a recorded value.
>> > Yammer
>> > > > > Histogram, on the other hand, uses reservoir sampling. The
>> reported
>> > > value
>> > > > > is always one of the recorded values. So, I am not sure that
>> > Histogram
>> > > in
>> > > > > KafkaMetrics is as good as Yammer Histogram.
>> > > > ClientMetricsPluginExportTime
>> > > > > uses Histogram.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > > On Thu, Mar 31, 2022 at 5:21 PM Xavier Léauté
>> > > > <xav...@confluent.io.invalid>
>> > > > > wrote:
>> > > > >
>> > > > >> >
>> > > > >> > 28. On the broker, we typically use Yammer metrics. Only for
>> > metrics
>> > > > >> that
>> > > > >> > depend on Kafka metric features (e.g., quota), we use the Kafka
>> > > > metric.
>> > > > >> > Yammer metrics have 4 types: gauge, meter, histogram and timer.
>> > > meter
>> > > > >> > calculates a rate, but also exposes an accumulated value.
>> > > > >> >
>> > > > >>
>> > > > >> I don't see a good reason we should limit ourselves to Yammer
>> > metrics
>> > > on
>> > > > >> the broker. KafkaMetrics was written
>> > > > >> to replace Yammer metrics and is used for all new components
>> > (clients,
>> > > > >> streams, connect, etc.)
>> > > > >> My understanding is that the original goal was to retire Yammer
>> > > metrics
>> > > > in
>> > > > >> the broker in favor of KafkaMetrics.
>> > > > >> We just haven't done so out of backwards compatibility concerns.
>> > > > >> There are other broker metrics such as group coordinator,
>> > transaction
>> > > > >> state
>> > > > >> manager, and various socket server metrics
>> > > > >> already using KafkaMetrics that don't need specific Kafka metric
>> > > > features,
>> > > > >> so I don't see why we should refrain from using
>> > > > >> Kafka metrics on the broker unless there are real compatibility
>> > > concerns
>> > > > >> or
>> > > > >> where implementation specifics could lead to confusion when
>> > comparing
>> > > > >> metrics using different implementations.
>> > > > >>
>> > > > >> In my opinion we should encourage people to use KafkaMetrics
>> going
>> > > > forward
>> > > > >> on the broker as well, for two reasons:
>> > > > >> a) yammer metrics is long deprecated and no longer maintained
>> > > > >> b) yammer metrics are much less expressive
>> > > > >> c) we don't have a proper API to expose yammer metrics outside of
>> > JMX
>> > > > >> (MetricsReporter only exposes KafkaMetrics)
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to