Den fre 20 maj 2022 kl 01:23 skrev Jun Rao <j...@confluent.io.invalid>:
> Hi, Magus, > > Thanks for the reply. > > 50. Sounds good. > > 51. I miss-understood the proposal in the KIP then. The proposal is to > define a set of common metric names that every client should implement. The > problem is that every client already has its own set of metrics with its > own names. I am not sure that we could easily agree upon a common set of > metrics that work with all clients. There are likely to be some metrics > that are client specific. Translating between the common name and client > specific name is probably going to add more confusion. As mentioned in the > KIP, similar metrics from different clients could have subtle > semantic differences. Could we just let each client use its own set of > metric names? > We identified a common set of metrics that should be relevant for most client implementations, they're the ones listed in the KIP. A supporting client does not have to implement all those metrics, only the ones that makes sense based on that client implementation, and a client may implement other metrics that are not listed in the KIP under its own namespace. This approach has two benefits: - there will be a common set of metrics that most/all clients implement, which makes monitoring and troubleshooting easier across fleets with multiple Kafka client languages/implementations. - client-specific metrics are still possible, so if there is no suitable standard metric a client can still provide what special metrics it has. Thanks, Magnus On Thu, May 19, 2022 at 10:39 AM Magnus Edenhill <mag...@edenhill.se> wrote: > > > Den ons 18 maj 2022 kl 19:57 skrev Jun Rao <j...@confluent.io.invalid>: > > > > > Hi, Magnus, > > > > > > > Hi Jun > > > > > > > > > > Thanks for the updated KIP. Just a couple of more comments. > > > > > > 50. To troubleshoot a particular client issue, I imagine that the > client > > > needs to identify its client_instance_id. How does the client find this > > > out? Do we plan to include client_instance_id in the client log, expose > > it > > > as a metric or something else? > > > > > > > The KIP suggests that client implementations emit an informative log > > message > > with the assigned client-instance-id once it is retrieved (once per > client > > instance lifetime). > > There's also a clientInstanceId() method that an application can use to > > retrieve > > the client instance id and emit through whatever side channels makes > sense. > > > > > > > > > 51. The KIP lists a bunch of metrics that need to be collected at the > > > client side. However, it seems quite a few useful java client metrics > > like > > > the following are missing. > > > buffer-total-bytes > > > buffer-available-bytes > > > > > > > These are covered by client.producer.record.queue.bytes and > > client.producer.record.queue.max.bytes. > > > > > > > bufferpool-wait-time > > > > > > > Missing, but somewhat implementation specific. > > If it was up to me we would add this later if there's a need. > > > > > > > > > batch-size-avg > > > batch-size-max > > > > > > > These are missing and would be suitably represented as a histogram. I'll > > add them. > > > > > > > > > io-wait-ratio > > > io-ratio > > > > > > > There's client.io.wait.time which should cover io-wait-ratio. > > We could add a client.io.time as well, now or in a later KIP. > > > > Thanks, > > Magnus > > > > > > > > > > > > > > Thanks, > > > > > > Jun > > > > > > On Mon, Apr 4, 2022 at 10:01 AM Jun Rao <j...@confluent.io> wrote: > > > > > > > Hi, Xavier, > > > > > > > > Thanks for the reply. > > > > > > > > 28. It does seem that we have started using KafkaMetrics on the > broker > > > > side. Then, my only concern is on the usage of Histogram in > > KafkaMetrics. > > > > Histogram in KafkaMetrics statically divides the value space into a > > fixed > > > > number of buckets and only returns values on the bucket boundary. So, > > the > > > > returned histogram value may never show up in a recorded value. > Yammer > > > > Histogram, on the other hand, uses reservoir sampling. The reported > > value > > > > is always one of the recorded values. So, I am not sure that > Histogram > > in > > > > KafkaMetrics is as good as Yammer Histogram. > > > ClientMetricsPluginExportTime > > > > uses Histogram. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > On Thu, Mar 31, 2022 at 5:21 PM Xavier Léauté > > > <xav...@confluent.io.invalid> > > > > wrote: > > > > > > > >> > > > > >> > 28. On the broker, we typically use Yammer metrics. Only for > metrics > > > >> that > > > >> > depend on Kafka metric features (e.g., quota), we use the Kafka > > > metric. > > > >> > Yammer metrics have 4 types: gauge, meter, histogram and timer. > > meter > > > >> > calculates a rate, but also exposes an accumulated value. > > > >> > > > > >> > > > >> I don't see a good reason we should limit ourselves to Yammer > metrics > > on > > > >> the broker. KafkaMetrics was written > > > >> to replace Yammer metrics and is used for all new components > (clients, > > > >> streams, connect, etc.) > > > >> My understanding is that the original goal was to retire Yammer > > metrics > > > in > > > >> the broker in favor of KafkaMetrics. > > > >> We just haven't done so out of backwards compatibility concerns. > > > >> There are other broker metrics such as group coordinator, > transaction > > > >> state > > > >> manager, and various socket server metrics > > > >> already using KafkaMetrics that don't need specific Kafka metric > > > features, > > > >> so I don't see why we should refrain from using > > > >> Kafka metrics on the broker unless there are real compatibility > > concerns > > > >> or > > > >> where implementation specifics could lead to confusion when > comparing > > > >> metrics using different implementations. > > > >> > > > >> In my opinion we should encourage people to use KafkaMetrics going > > > forward > > > >> on the broker as well, for two reasons: > > > >> a) yammer metrics is long deprecated and no longer maintained > > > >> b) yammer metrics are much less expressive > > > >> c) we don't have a proper API to expose yammer metrics outside of > JMX > > > >> (MetricsReporter only exposes KafkaMetrics) > > > >> > > > > > > > > > >