Re: [DISCUSSION] New broker metric. Per partition consumer offset

Nikolay Izhikov Wed, 16 Feb 2022 11:20:21 -0800

Hello, Dylan.

> At larger scales (e.g., thousands+ of partitions and hundreds+ of consumer 
> groups) the cardinality of metrics is very high for a broker and very 
> challenging for a metrics collector to pull out of JMX.


Agreed.

0. Kafka have `metrics.jmx.exclude`, `metrics.jmx.include` properties to reduce 
metrics count if it required.
1. We should improve JMX exporter or develop a new one if existing can’t expose 
required,  isn’t it?

> 16 февр. 2022 г., в 18:47, Meissner, Dylan 
> <dylan.t.meiss...@nordstrom.com.INVALID> написал(а):
> 
> It would be very convenient for consumer applications that are not collecting 
> and shipping their own metrics to have Kafka cluster doing this for them.
> 
> At larger scales (e.g., thousands+ of partitions and hundreds+ of consumer 
> groups) the cardinality of metrics is very high for a broker and very 
> challenging for a metrics collector to pull out of JMX. Consumer groups 
> specifically often see randomly generated ids which, depending on value of 
> broker's offsets.retention config, can be represented for days and weeks.
> 
> KIP-714 is significant for reporting lag at larger scales and can skip 
> broker's JMX entirely. Client is already collecting 
> consumer-fetch-manager-metrics metrics, can report them to cluster, the 
> broker can feed metrics to subscriptions, and this "just works" without new 
> code in group coordinator.
> 
> ________________________________
> From: Николай Ижиков <nizhikov....@gmail.com> on behalf of Nikolay Izhikov 
> <nizhi...@apache.org>
> Sent: Wednesday, February 16, 2022 7:11 AM
> To: dev@kafka.apache.org <dev@kafka.apache.org>
> Subject: Re: [DISCUSSION] New broker metric. Per partition consumer offset
> 
> Chris, thanks for the support.
> 
> Dear Kafka committers, can you, please, advise me:
> 
> Are you support my proposal?
> Can I implement new metrics in the scope of separate KIP?
> 
> KIP-714 seems to me much more complex improvement.
> Moreover, it has similar but slightly different goal.
> 
> All I propose is to expose existing offset data as a metrics on broker side.
> 
>> 16 февр. 2022 г., в 17:52, Chris Egerton <chr...@confluent.io.INVALID> 
>> написал(а):
>> 
>> Hi Nikolay,
>> 
>> Yep, makes sense to me 👍
>> 
>> Sounds like the motivation here is similar to KIP-714 [1], which allows
>> clients to publish their own metrics directly to a broker. It seems like
>> one reason this use case isn't already addressed in that KIP is that, if
>> all you're doing is taking the delta between a consumer group's
>> latest-committed offsets and the latest stable offsets (LSO) for a set of
>> topic partitions, none of that requires the consumer to directly publish
>> metrics to the broker instead of implicitly updating that metric by
>> committing offsets. In short, as you've noted--that data is already
>> available on the broker.
>> 
>> I think you make a reasonable case and, coupled with the precedent set by
>> KIP-714 (which, though not yet accepted, seems to have significant traction
>> at the moment), it'd be nice to see these metrics available broker-side.
>> 
>> I do wonder if there's a question about where the line should be drawn for
>> other client metrics, but will leave that to people more familiar with
>> broker logic to think through.
>> 
>> [1] -
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-Motivation
>> 
>> Cheers,
>> 
>> Chris
>> 
>> On Wed, Feb 16, 2022 at 9:23 AM Nikolay Izhikov <nizhi...@apache.org> wrote:
>> 
>>> Hello, Chris.
>>> 
>>> Thanks for the feedback.
>>> 
>>>> Have you seen the consumer-side lag metrics [1]? "records-lag»,
>>> 
>>> Yes, I’m aware of these metrics.
>>> 
>>>> If so, I'd be curious to know what the motivation for duplicating
>>> existing client metrics onto brokers would be?
>>> 
>>> It can be a complex task to setup and access monitoring data for all
>>> Consumers.
>>> Clients can be new, experimental and not integrated into company
>>> monitoring solution.
>>> Instances can come and go;
>>> Clients can change addresses, etc. based on some circumstances not related
>>> to Kafka.
>>> 
>>> I think it will be useful to have per partition consumer offset metrics on
>>> broker side.
>>> It allows to Kafka administrator collect monitoring data in one place.
>>> 
>>> Moreover, these data available on the broker, already.
>>> All we need is to expose them.
>>> 
>>> Makes sense for you?
>>> 
>>>> 16 февр. 2022 г., в 17:01, Chris Egerton <chr...@confluent.io.INVALID>
>>> написал(а):
>>>> 
>>>> Hi Nikolay,
>>>> 
>>>> Have you seen the consumer-side lag metrics [1]? "records-lag",
>>>> "records-lag-avg", "records-lag-max" all give lag stats on a
>>>> per-topic-partition basis.
>>>> 
>>>> If so, I'd be curious to know what the motivation for duplicating
>>> existing
>>>> client metrics onto brokers would be?
>>>> 
>>>> [1]
>>> https://kafka.apache.org/31/documentation.html#consumer_fetch_monitoring
>>>> 
>>>> Cheers,
>>>> 
>>>> Chris
>>>> 
>>>> On Wed, Feb 16, 2022 at 4:38 AM Nikolay Izhikov <nizhi...@apache.org>
>>> wrote:
>>>> 
>>>>> Hello, Kafka team.
>>>>> 
>>>>> When running in production the common user question is «How big lag
>>>>> between producer and consumer?».
>>>>> We have a `kafka-consumer-groups.sh` tool and
>>>>> `AdminClient#getListConsumerGroupOffsetsCall` to answer the question.
>>>>> 
>>>>> Even detailed guides on how to calculate *consumer lag* with built-in
>>>>> Kafka tools, exists. [1]
>>>>> 
>>>>> Obviously, approach with tool or AdminClient requires additional coding
>>>>> and setup which can be inconvenient.
>>>>> 
>>>>> I think Kafka should provide per partition consumer offset metric.
>>>>> It will simplify running Kafka deployment and monitoring in production.
>>>>> 
>>>>> Looked in `GroupMetadataManager.scala` and think it possible to add
>>> those
>>>>> metrics.
>>>>> 
>>>>> What do you think?
>>>>> Do we need this metrics on Kafka broker?
>>>>> 
>>>>> [1] https://www.baeldung.com/java-kafka-consumer-lag
>>> 
>>>

Re: [DISCUSSION] New broker metric. Per partition consumer offset

Reply via email to