[ 
https://issues.apache.org/jira/browse/KAFKA-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735694#comment-14735694
 ] 

Will Funnell commented on KAFKA-2500:
-------------------------------------

[~hachikuji] thanks for looking into this. I'll try and clarify my requirement 
if it helps:
Given a log compacted topic, I need to consume every key at least once, then 
cancel the subscription.

> It seems like there could also be new records pushed in the time that it 
> takes for the fetch response to be returned, right? It only reduces the 
> window.

At the moment the high watermark, in my implementation in KAFKA-1977, can be 
compared with the current offset when the message is received, if they match, 
you can finish. 

It is my understanding that to achieve the same functionality you would need to 
call the API as specified in KAFKA-2076 to get the High Watermark after every 
message, which would not seem performant.

I think Jay Kreps imitates this when defining the HW:
> 1. Consumer offset is determined by consumer, while HW is determined by 
> producer. This means consumer offsets needs only minimum communication with 
> broker, but HW needs frequent communication.
> 2. Typically user will only fetch offsets when starting consumption but user 
> may care about HW both before starting consumption and during the consuming 
> as it reflects lags. This means the HW updates should be cheap otherwise the 
> overhead would be big.

If I make an OffsetRequest (with HW information) call at the beginning, by the 
time my partition's offset matches the HW, I will miss messages that have been 
compacted in the meantime.

> Make logEndOffset available in the 0.8.3 Consumer
> -------------------------------------------------
>
>                 Key: KAFKA-2500
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2500
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>    Affects Versions: 0.8.3
>            Reporter: Will Funnell
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 0.8.3
>
>
> Originally created in the old consumer here: 
> https://issues.apache.org/jira/browse/KAFKA-1977
> The requirement is to create a snapshot from the Kafka topic but NOT do 
> continual reads after that point. For example you might be creating a backup 
> of the data to a file.
> This ticket covers the addition of the functionality to the new consumer.
> In order to achieve that, a recommended solution by Joel Koshy and Jay Kreps 
> was to expose the high watermark, as maxEndOffset, from the FetchResponse 
> object through to each MessageAndMetadata object in order to be aware when 
> the consumer has reached the end of each partition.
> The submitted patch achieves this by adding the maxEndOffset to the 
> PartitionTopicInfo, which is updated when a new message arrives in the 
> ConsumerFetcherThread and then exposed in MessageAndMetadata.
> See here for discussion:
> http://search-hadoop.com/m/4TaT4TpJy71



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to