[ https://issues.apache.org/jira/browse/KAFKA-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389466#comment-14389466 ]
Jiangjie Qin commented on KAFKA-2076: ------------------------------------- [~jkreps] Totally agree that we need to think about improvement on new consumer API to fix the gap between new consumer and old consumer. I think the new consumer API should provide all reasonable functions simple consumer supports. Obviously it won't be as strong because currently simple consumer supports sending every type of request and responses. I think we can create the following new APIs: long earliestOffsetsFor(List<TopicPartition> partitions) long latestOffsetsFor(List<TopicPartition> partitions) long offsetsBefore(Map<TopicPartition, Long> partitions) Not sure if we need a KIP on this, if you think so please let me know. I went through all the possible requests simple consumer can send. 1. ConsumerMetadataRequest - Should be supported after coordinator implementation 2. FetchRequest - supported with seek() and poll() 3. OffsetCommitRequest - supported with commit() and commit(offsetMap) 4. OffsetRequest - not supported 5. OffsetFetchRequest - supported by committed() 6. TopicMetadataRequest - supported with partitionsFor() >From what I can see and also as you mentioned, there are only following >functions that are useful but not yet supported by new consumer yet: 1. getOffsetBefore 2. earliestOrLatest 3. high watermark Implementation wise, I kind of think high watermark is a little bit different from other offsets in the following ways: 1. Consumer offset is determined by consumer, while HW is determined by producer. This means consumer offsets needs only minimum communication with broker, but HW needs frequent communication. 2. Typically user will only fetch offsets when starting consumption but user may care about HW both before starting consumption and during the consuming as it reflects lags. This means the HW updates should be cheap otherwise the overhead would be big. Based on the above two points. I think The implementation I have in mind now is to store HW in a separate map. The value will be updated when: 1. latestOffsetFor(partition) is called, which always sends a OffsetRequest. 2. a piggybacked HW from fetch request is received. I don't quite understand why the server side offset time stamp would be a blocker for adding this API. It looks transparent to the user, right? > Add an API to new consumer to allow user get high watermark of partitions. > -------------------------------------------------------------------------- > > Key: KAFKA-2076 > URL: https://issues.apache.org/jira/browse/KAFKA-2076 > Project: Kafka > Issue Type: Improvement > Reporter: Jiangjie Qin > > We have a use case that user wants to know how far it is behind a particular > partition on startup. Currently in each fetch response, we have high > watermark for each partition, we only keep a global max-lag metric. It would > be better that we keep a record of high watermark per partition and update it > on each fetch response. We can add a new API to let user query the high > watermark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)