Hello, Dylan. > At larger scales (e.g., thousands+ of partitions and hundreds+ of consumer > groups) the cardinality of metrics is very high for a broker and very > challenging for a metrics collector to pull out of JMX.
Agreed. 0. Kafka have `metrics.jmx.exclude`, `metrics.jmx.include` properties to reduce metrics count if it required. 1. We should improve JMX exporter or develop a new one if existing can’t expose required, isn’t it? > 16 февр. 2022 г., в 18:47, Meissner, Dylan > <dylan.t.meiss...@nordstrom.com.INVALID> написал(а): > > It would be very convenient for consumer applications that are not collecting > and shipping their own metrics to have Kafka cluster doing this for them. > > At larger scales (e.g., thousands+ of partitions and hundreds+ of consumer > groups) the cardinality of metrics is very high for a broker and very > challenging for a metrics collector to pull out of JMX. Consumer groups > specifically often see randomly generated ids which, depending on value of > broker's offsets.retention config, can be represented for days and weeks. > > KIP-714 is significant for reporting lag at larger scales and can skip > broker's JMX entirely. Client is already collecting > consumer-fetch-manager-metrics metrics, can report them to cluster, the > broker can feed metrics to subscriptions, and this "just works" without new > code in group coordinator. > > ________________________________ > From: Николай Ижиков <nizhikov....@gmail.com> on behalf of Nikolay Izhikov > <nizhi...@apache.org> > Sent: Wednesday, February 16, 2022 7:11 AM > To: dev@kafka.apache.org <dev@kafka.apache.org> > Subject: Re: [DISCUSSION] New broker metric. Per partition consumer offset > > Chris, thanks for the support. > > Dear Kafka committers, can you, please, advise me: > > Are you support my proposal? > Can I implement new metrics in the scope of separate KIP? > > KIP-714 seems to me much more complex improvement. > Moreover, it has similar but slightly different goal. > > All I propose is to expose existing offset data as a metrics on broker side. > >> 16 февр. 2022 г., в 17:52, Chris Egerton <chr...@confluent.io.INVALID> >> написал(а): >> >> Hi Nikolay, >> >> Yep, makes sense to me 👍 >> >> Sounds like the motivation here is similar to KIP-714 [1], which allows >> clients to publish their own metrics directly to a broker. It seems like >> one reason this use case isn't already addressed in that KIP is that, if >> all you're doing is taking the delta between a consumer group's >> latest-committed offsets and the latest stable offsets (LSO) for a set of >> topic partitions, none of that requires the consumer to directly publish >> metrics to the broker instead of implicitly updating that metric by >> committing offsets. In short, as you've noted--that data is already >> available on the broker. >> >> I think you make a reasonable case and, coupled with the precedent set by >> KIP-714 (which, though not yet accepted, seems to have significant traction >> at the moment), it'd be nice to see these metrics available broker-side. >> >> I do wonder if there's a question about where the line should be drawn for >> other client metrics, but will leave that to people more familiar with >> broker logic to think through. >> >> [1] - >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-Motivation >> >> Cheers, >> >> Chris >> >> On Wed, Feb 16, 2022 at 9:23 AM Nikolay Izhikov <nizhi...@apache.org> wrote: >> >>> Hello, Chris. >>> >>> Thanks for the feedback. >>> >>>> Have you seen the consumer-side lag metrics [1]? "records-lag», >>> >>> Yes, I’m aware of these metrics. >>> >>>> If so, I'd be curious to know what the motivation for duplicating >>> existing client metrics onto brokers would be? >>> >>> It can be a complex task to setup and access monitoring data for all >>> Consumers. >>> Clients can be new, experimental and not integrated into company >>> monitoring solution. >>> Instances can come and go; >>> Clients can change addresses, etc. based on some circumstances not related >>> to Kafka. >>> >>> I think it will be useful to have per partition consumer offset metrics on >>> broker side. >>> It allows to Kafka administrator collect monitoring data in one place. >>> >>> Moreover, these data available on the broker, already. >>> All we need is to expose them. >>> >>> Makes sense for you? >>> >>>> 16 февр. 2022 г., в 17:01, Chris Egerton <chr...@confluent.io.INVALID> >>> написал(а): >>>> >>>> Hi Nikolay, >>>> >>>> Have you seen the consumer-side lag metrics [1]? "records-lag", >>>> "records-lag-avg", "records-lag-max" all give lag stats on a >>>> per-topic-partition basis. >>>> >>>> If so, I'd be curious to know what the motivation for duplicating >>> existing >>>> client metrics onto brokers would be? >>>> >>>> [1] >>> https://kafka.apache.org/31/documentation.html#consumer_fetch_monitoring >>>> >>>> Cheers, >>>> >>>> Chris >>>> >>>> On Wed, Feb 16, 2022 at 4:38 AM Nikolay Izhikov <nizhi...@apache.org> >>> wrote: >>>> >>>>> Hello, Kafka team. >>>>> >>>>> When running in production the common user question is «How big lag >>>>> between producer and consumer?». >>>>> We have a `kafka-consumer-groups.sh` tool and >>>>> `AdminClient#getListConsumerGroupOffsetsCall` to answer the question. >>>>> >>>>> Even detailed guides on how to calculate *consumer lag* with built-in >>>>> Kafka tools, exists. [1] >>>>> >>>>> Obviously, approach with tool or AdminClient requires additional coding >>>>> and setup which can be inconvenient. >>>>> >>>>> I think Kafka should provide per partition consumer offset metric. >>>>> It will simplify running Kafka deployment and monitoring in production. >>>>> >>>>> Looked in `GroupMetadataManager.scala` and think it possible to add >>> those >>>>> metrics. >>>>> >>>>> What do you think? >>>>> Do we need this metrics on Kafka broker? >>>>> >>>>> [1] https://www.baeldung.com/java-kafka-consumer-lag >>> >>>