It would be very convenient for consumer applications that are not collecting 
and shipping their own metrics to have Kafka cluster doing this for them.

At larger scales (e.g., thousands+ of partitions and hundreds+ of consumer 
groups) the cardinality of metrics is very high for a broker and very 
challenging for a metrics collector to pull out of JMX. Consumer groups 
specifically often see randomly generated ids which, depending on value of 
broker's offsets.retention config, can be represented for days and weeks.

KIP-714 is significant for reporting lag at larger scales and can skip broker's 
JMX entirely. Client is already collecting consumer-fetch-manager-metrics 
metrics, can report them to cluster, the broker can feed metrics to 
subscriptions, and this "just works" without new code in group coordinator.

________________________________
From: Николай Ижиков <nizhikov....@gmail.com> on behalf of Nikolay Izhikov 
<nizhi...@apache.org>
Sent: Wednesday, February 16, 2022 7:11 AM
To: dev@kafka.apache.org <dev@kafka.apache.org>
Subject: Re: [DISCUSSION] New broker metric. Per partition consumer offset

Chris, thanks for the support.

Dear Kafka committers, can you, please, advise me:

Are you support my proposal?
Can I implement new metrics in the scope of separate KIP?

KIP-714 seems to me much more complex improvement.
Moreover, it has similar but slightly different goal.

All I propose is to expose existing offset data as a metrics on broker side.

> 16 февр. 2022 г., в 17:52, Chris Egerton <chr...@confluent.io.INVALID> 
> написал(а):
>
> Hi Nikolay,
>
> Yep, makes sense to me 👍
>
> Sounds like the motivation here is similar to KIP-714 [1], which allows
> clients to publish their own metrics directly to a broker. It seems like
> one reason this use case isn't already addressed in that KIP is that, if
> all you're doing is taking the delta between a consumer group's
> latest-committed offsets and the latest stable offsets (LSO) for a set of
> topic partitions, none of that requires the consumer to directly publish
> metrics to the broker instead of implicitly updating that metric by
> committing offsets. In short, as you've noted--that data is already
> available on the broker.
>
> I think you make a reasonable case and, coupled with the precedent set by
> KIP-714 (which, though not yet accepted, seems to have significant traction
> at the moment), it'd be nice to see these metrics available broker-side.
>
> I do wonder if there's a question about where the line should be drawn for
> other client metrics, but will leave that to people more familiar with
> broker logic to think through.
>
> [1] -
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-Motivation
>
> Cheers,
>
> Chris
>
> On Wed, Feb 16, 2022 at 9:23 AM Nikolay Izhikov <nizhi...@apache.org> wrote:
>
>> Hello, Chris.
>>
>> Thanks for the feedback.
>>
>>> Have you seen the consumer-side lag metrics [1]? "records-lag»,
>>
>> Yes, I’m aware of these metrics.
>>
>>> If so, I'd be curious to know what the motivation for duplicating
>> existing client metrics onto brokers would be?
>>
>> It can be a complex task to setup and access monitoring data for all
>> Consumers.
>> Clients can be new, experimental and not integrated into company
>> monitoring solution.
>> Instances can come and go;
>> Clients can change addresses, etc. based on some circumstances not related
>> to Kafka.
>>
>> I think it will be useful to have per partition consumer offset metrics on
>> broker side.
>> It allows to Kafka administrator collect monitoring data in one place.
>>
>> Moreover, these data available on the broker, already.
>> All we need is to expose them.
>>
>> Makes sense for you?
>>
>>> 16 февр. 2022 г., в 17:01, Chris Egerton <chr...@confluent.io.INVALID>
>> написал(а):
>>>
>>> Hi Nikolay,
>>>
>>> Have you seen the consumer-side lag metrics [1]? "records-lag",
>>> "records-lag-avg", "records-lag-max" all give lag stats on a
>>> per-topic-partition basis.
>>>
>>> If so, I'd be curious to know what the motivation for duplicating
>> existing
>>> client metrics onto brokers would be?
>>>
>>> [1]
>> https://kafka.apache.org/31/documentation.html#consumer_fetch_monitoring
>>>
>>> Cheers,
>>>
>>> Chris
>>>
>>> On Wed, Feb 16, 2022 at 4:38 AM Nikolay Izhikov <nizhi...@apache.org>
>> wrote:
>>>
>>>> Hello, Kafka team.
>>>>
>>>> When running in production the common user question is «How big lag
>>>> between producer and consumer?».
>>>> We have a `kafka-consumer-groups.sh` tool and
>>>> `AdminClient#getListConsumerGroupOffsetsCall` to answer the question.
>>>>
>>>> Even detailed guides on how to calculate *consumer lag* with built-in
>>>> Kafka tools, exists. [1]
>>>>
>>>> Obviously, approach with tool or AdminClient requires additional coding
>>>> and setup which can be inconvenient.
>>>>
>>>> I think Kafka should provide per partition consumer offset metric.
>>>> It will simplify running Kafka deployment and monitoring in production.
>>>>
>>>> Looked in `GroupMetadataManager.scala` and think it possible to add
>> those
>>>> metrics.
>>>>
>>>> What do you think?
>>>> Do we need this metrics on Kafka broker?
>>>>
>>>> [1] https://www.baeldung.com/java-kafka-consumer-lag
>>
>>

Reply via email to