[jira] [Created] (KAFKA-8656) Kafka Consumer Record Latency Metric

Sean Glover (JIRA) Thu, 11 Jul 2019 09:25:18 -0700

Sean Glover created KAFKA-8656:
----------------------------------

             Summary: Kafka Consumer Record Latency Metric
                 Key: KAFKA-8656
                 URL: https://issues.apache.org/jira/browse/KAFKA-8656
             Project: Kafka
          Issue Type: New Feature
          Components: metrics
            Reporter: Sean Glover
            Assignee: Sean Glover



Consumer lag is a useful metric to monitor how many records are queued to be 
processed.  We can look at individual lag per partition or we may aggregate 
metrics. For example, we may want to monitor what the maximum lag of any 
particular partition in our consumer subscription so we can identify hot 
partitions, caused by an insufficient producing partitioning strategy.  We may 
want to monitor a sum of lag across all partitions so we have a sense as to our 
total backlog of messages to consume. Lag in offsets is useful when you have a 
good understanding of your messages and processing characteristics, but it 
doesn’t tell us how far behind _in time_ we are.  This is known as wait time in 
queueing theory, or more informally it’s referred to as latency.

The latency of a message can be defined as the difference between when that 
message was first produced to when the message is received by a consumer.  The 
latency of records in a partition correlates with lag, but a larger lag doesn’t 
necessarily mean a larger latency. For example, a topic consumed by two 
separate application consumer groups A and B may have similar lag, but 
different latency per partition.  Application A is a consumer which performs 
CPU intensive business logic on each message it receives. It’s distributed 
across many consumer group members to handle the load quickly enough, but since 
its processing time is slower, it takes longer to process each message per 
partition.  Meanwhile, Application B is a consumer which performs a simple ETL 
operation to land streaming data in another system, such as HDFS. It may have 
similar lag to Application A, but because it has a faster processing time its 
latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build 
Service Level Agreements (SLAs) based on non-functional requirements of the 
streaming system.  For example, the system must never have a latency of greater 
than 10 minutes. This SLA could be used in monitoring alerts or as input to 
automatic scaling solutions.

[KIP-488|[https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]]

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (KAFKA-8656) Kafka Consumer Record Latency Metric

Reply via email to