Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

Sean Glover Wed, 11 Dec 2019 08:52:38 -0800

Hello again,

There has been some interest in this KIP recently.  I'm bumping the thread
to encourage feedback on the design.


Regards,
Sean

On Mon, Jul 15, 2019 at 9:01 AM Sean Glover <sean.glo...@lightbend.com>
wrote:

> To hopefully spark some discussion I've copied the motivation section from
> the KIP:
>
> Consumer lag is a useful metric to monitor how many records are queued to
> be processed.  We can look at individual lag per partition or we may
> aggregate metrics. For example, we may want to monitor what the maximum lag
> of any particular partition in our consumer subscription so we can identify
> hot partitions, caused by an insufficient producing partitioning strategy.
> We may want to monitor a sum of lag across all partitions so we have a
> sense as to our total backlog of messages to consume. Lag in offsets is
> useful when you have a good understanding of your messages and processing
> characteristics, but it doesn’t tell us how far behind *in time* we are.
> This is known as wait time in queueing theory, or more informally it’s
> referred to as latency.
>
> The latency of a message can be defined as the difference between when
> that message was first produced to when the message is received by a
> consumer.  The latency of records in a partition correlates with lag, but a
> larger lag doesn’t necessarily mean a larger latency. For example, a topic
> consumed by two separate application consumer groups A and B may have
> similar lag, but different latency per partition.  Application A is a
> consumer which performs CPU intensive business logic on each message it
> receives. It’s distributed across many consumer group members to handle the
> load quickly enough, but since its processing time is slower, it takes
> longer to process each message per partition.  Meanwhile, Application B is
> a consumer which performs a simple ETL operation to land streaming data in
> another system, such as HDFS. It may have similar lag to Application A, but
> because it has a faster processing time its latency per partition is
> significantly less.
>
> If the Kafka Consumer reported a latency metric it would be easier to
> build Service Level Agreements (SLAs) based on non-functional requirements
> of the streaming system.  For example, the system must never have a latency
> of greater than 10 minutes. This SLA could be used in monitoring alerts or
> as input to automatic scaling solutions.
>
> On Thu, Jul 11, 2019 at 12:36 PM Sean Glover <sean.glo...@lightbend.com>
> wrote:
>
>> Hi kafka-dev,
>>
>> I've created KIP-489 as a proposal for adding latency metrics to the
>> Kafka Consumer in a similar way as record-lag metrics are implemented.
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/489%3A+Kafka+Consumer+Record+Latency+Metric
>>
>> Regards,
>> Sean
>>
>> --
>> Principal Engineer, Lightbend, Inc.
>>
>> <http://lightbend.com>
>>
>> @seg1o <https://twitter.com/seg1o>, in/seanaglover
>> <https://www.linkedin.com/in/seanaglover/>
>>
>
>
> --
> Principal Engineer, Lightbend, Inc.
>
> <http://lightbend.com>
>
> @seg1o <https://twitter.com/seg1o>, in/seanaglover
> <https://www.linkedin.com/in/seanaglover/>
>

Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

Reply via email to