Thanks everyone. To answer Charlie's question:
I'm doing some simple stream processing. I have Topics A,B, and C, all using log compaction and all recordings having primary keys. The data in Topic A is essentially a routing table that tells me which primary keys in Topics B and C I should pay attention to. So before I start consuming B and C, I need to have all/most of Topic A loaded into a local routing table. As Topic A is updated, then I will continue to update my routing table, and use it to continually process events coming from B and C. Hope that makes sense. All of the solutions look good. Will, that patch does exactly what I want, but I'm not sure I want to patch Kafka right now. I'll keep it in mind. Thanks. -James On May 9, 2015, at 10:42 AM, Charlie Knudsen <charlie.knud...@smartthings.com> wrote: > Hi James, > What are you trying to do exactly? If all you are trying to do is monitor > how far behind a consumer is getting you could use the ConsumerOffsetChecker. > As described in the link below. > http://community.spiceworks.com/how_to/77610-how-far-behind-is-your-kafka-consumer > > Each message being processed will also have the offset and partition > attached to it so with that data. I suppose that information plus info from > a fetch response you could determine this with in an application. > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchResponse > > Does that help? > > > On Fri, May 8, 2015 at 6:04 PM, James Cheng <jch...@tivo.com> wrote: > >> Hi, >> >> I want to use the high level consumer to read all partitions for a topic, >> and know when I have reached "the end". I know "the end" might be a little >> vague, since items keep showing up, but I'm trying to get as close as >> possible. I know that more messages might show up later, but I want to know >> when I've received all the items that are currently available in the topic. >> >> Is there a standard/recommended way to do this? >> >> I know one way to do it is to first issue an OffsetRequest for each >> partition, which would get me the last offset, and then use that >> information in my high level consumer to detect when I've reached that a >> message with that offset. Which is exactly what the SimpleConsumer example >> does ( >> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example). >> That involves finding the leader for the partition, etc etc. Not hard, but >> a bunch of steps. >> >> I noticed that kafkacat has an option similar to what I'm looking for: >> -e Exit successfully when last message received >> >> Looking at the code, it appears that a FetchRequest returns the >> HighwaterMarkOffset mark for a partition, and the API docs confirm that: >> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchResponse >> >> Does the Java high-level consumer expose the HighwaterMarkOffset in any >> way? I looked but I couldn't find such a thing. >> >> Thanks, >> -James >> >>