Matthias J. Sax created KAFKA-18344:
---------------------------------------

             Summary: Consider to distinguish between multiple "positions"
                 Key: KAFKA-18344
                 URL: https://issues.apache.org/jira/browse/KAFKA-18344
             Project: Kafka
          Issue Type: Improvement
          Components: clients, consumer
            Reporter: Matthias J. Sax


KafkaConsumer currently maintains a "position" which is the max offset of 
records returned via `poll()`.

This "position" is used to compute the consumer "lag metrics". This implies, 
that lag is computed slightly different on the consumer, compared to other 
tools which use `endOffset - committedOffset`, because "position" does not 
reflect the latest _processed_ record, but might be ahead of what the 
application code did process. If lag is computed as "endOffset - 
committedOffset", lag is always behind, ie, larger than the real lag, what 
might actually provide better semantics. – It seems undesired that the consumer 
lag metric could be smaller and the actual lag...

We should consider to update the position of the consumer differently:
 # A simple changes could be, to update the position to the offset of the 
first/oldest record in a `poll()` call (instead of latest/newest as we do right 
now), to avoid that the position get ahead and lag is "too small"
 # We could also try to hook into the returned `ConsumerRecords` iterator, to 
track the position more fine grained on a per-record basis
 # We could track multiple positions, like "processed positions" and "fetched 
position" (not that "fetched position" might be even further ahead than the 
current position, as based on `max.poll.records` not all fetch records might be 
returned from `poll()`)

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to