Matthias J. Sax created KAFKA-18344: ---------------------------------------
Summary: Consider to distinguish between multiple "positions" Key: KAFKA-18344 URL: https://issues.apache.org/jira/browse/KAFKA-18344 Project: Kafka Issue Type: Improvement Components: clients, consumer Reporter: Matthias J. Sax KafkaConsumer currently maintains a "position" which is the max offset of records returned via `poll()`. This "position" is used to compute the consumer "lag metrics". This implies, that lag is computed slightly different on the consumer, compared to other tools which use `endOffset - committedOffset`, because "position" does not reflect the latest _processed_ record, but might be ahead of what the application code did process. If lag is computed as "endOffset - committedOffset", lag is always behind, ie, larger than the real lag, what might actually provide better semantics. – It seems undesired that the consumer lag metric could be smaller and the actual lag... We should consider to update the position of the consumer differently: # A simple changes could be, to update the position to the offset of the first/oldest record in a `poll()` call (instead of latest/newest as we do right now), to avoid that the position get ahead and lag is "too small" # We could also try to hook into the returned `ConsumerRecords` iterator, to track the position more fine grained on a per-record basis # We could track multiple positions, like "processed positions" and "fetched position" (not that "fetched position" might be even further ahead than the current position, as based on `max.poll.records` not all fetch records might be returned from `poll()`) -- This message was sent by Atlassian Jira (v8.20.10#820010)