Re: Is there a way to know when I've reached the end of a partition (consumed all messages) when using the high-level consumer?

James Cheng Mon, 11 May 2015 10:35:08 -0700

Thanks everyone.

To answer Charlie's question:


I'm doing some simple stream processing. I have Topics A,B, and C, all using 
log compaction and all recordings having primary keys. The data in Topic A is 
essentially a routing table that tells me which primary keys in Topics B and C 
I should pay attention to. So before I start consuming B and C, I need to have 
all/most of Topic A loaded into a local routing table.  As Topic A is updated, 
then I will continue to update my routing table, and use it to continually 
process events coming from B and C.

Hope that makes sense.

All of the solutions look good. Will, that patch does exactly what I want, but 
I'm not sure I want to patch Kafka right now. I'll keep it in mind. Thanks.

-James

On May 9, 2015, at 10:42 AM, Charlie Knudsen <charlie.knud...@smartthings.com> 
wrote:

> Hi James,
> What are you trying to do exactly? If all you are trying to do is monitor
> how far behind a consumer is getting you could use the ConsumerOffsetChecker.
> As described in the link below.
> http://community.spiceworks.com/how_to/77610-how-far-behind-is-your-kafka-consumer
> 
> Each message being processed will also have the offset and partition
> attached to it so with that data. I suppose that information plus info from
> a fetch response you could determine this with in an application.
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchResponse
> 
> Does that help?
> 
> 
> On Fri, May 8, 2015 at 6:04 PM, James Cheng <jch...@tivo.com> wrote:
> 
>> Hi,
>> 
>> I want to use the high level consumer to read all partitions for a topic,
>> and know when I have reached "the end". I know "the end" might be a little
>> vague, since items keep showing up, but I'm trying to get as close as
>> possible. I know that more messages might show up later, but I want to know
>> when I've received all the items that are currently available in the topic.
>> 
>> Is there a standard/recommended way to do this?
>> 
>> I know one way to do it is to first issue an OffsetRequest for each
>> partition, which would get me the last offset, and then use that
>> information in my high level consumer to detect when I've reached that a
>> message with that offset. Which is exactly what the SimpleConsumer example
>> does (
>> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example).
>> That involves finding the leader for the partition, etc etc. Not hard, but
>> a bunch of steps.
>> 
>> I noticed that kafkacat has an option similar to what I'm looking for:
>>  -e                 Exit successfully when last message received
>> 
>> Looking at the code, it appears that a FetchRequest returns the
>> HighwaterMarkOffset mark for a partition, and the API docs confirm that:
>> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchResponse
>> 
>> Does the Java high-level consumer expose the HighwaterMarkOffset in any
>> way? I looked but I couldn't find such a thing.
>> 
>> Thanks,
>> -James
>> 
>>

Re: Is there a way to know when I've reached the end of a partition (consumed all messages) when using the high-level consumer?

Reply via email to