[ 
https://issues.apache.org/jira/browse/KAFKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731538#comment-14731538
 ] 

Will Funnell commented on KAFKA-2500:
-------------------------------------

[~hachikuji] From my understanding it would seem that KAFKA-2076 may not 
exactly solve the use case here.

What I need, as solved in KAFKA-1977 for the current consumer API, is the 
ability to consume a full snapshot of all the messages on a log compacted 
topic, ensuring each key has been consumed at least once.

It would seem that although KAFKA-2076 does make this possible, it requires a 
separate call to discover the high watermark. By the time the call has returned 
the topic may have received further messages, but if the high watermark is 
returned with each message, its possible to tell whether that is that last one 
and to immediately stop consuming.

It would also be very useful to expose the log cleaner point, this way you know 
when you have consumed past the point of missing any possible duplicate keys 
that have since been compacted.

> Make logEndOffset available in the 0.8.3 Consumer
> -------------------------------------------------
>
>                 Key: KAFKA-2500
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2500
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>    Affects Versions: 0.8.3
>            Reporter: Will Funnell
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 0.8.3
>
>
> Originally created in the old consumer here: 
> https://issues.apache.org/jira/browse/KAFKA-1977
> The requirement is to create a snapshot from the Kafka topic but NOT do 
> continual reads after that point. For example you might be creating a backup 
> of the data to a file.
> This ticket covers the addition of the functionality to the new consumer.
> In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps 
> was to expose the high watermark, as maxEndOffset, from the FetchResponse 
> object through to each MessageAndMetadata object in order to be aware when 
> the consumer has reached the end of each partition.
> The submitted patch achieves this by adding the maxEndOffset to the 
> PartitionTopicInfo, which is updated when a new message arrives in the 
> ConsumerFetcherThread and then exposed in MessageAndMetadata.
> See here for discussion:
> http://search-hadoop.com/m/4TaT4TpJy71



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to