For Flume, we use the timeout configuration and catch the exception, with the assumption that "no messages for few seconds" == "the end".
On Sat, May 9, 2015 at 2:04 AM, James Cheng <jch...@tivo.com> wrote: > Hi, > > I want to use the high level consumer to read all partitions for a topic, > and know when I have reached "the end". I know "the end" might be a little > vague, since items keep showing up, but I'm trying to get as close as > possible. I know that more messages might show up later, but I want to know > when I've received all the items that are currently available in the topic. > > Is there a standard/recommended way to do this? > > I know one way to do it is to first issue an OffsetRequest for each > partition, which would get me the last offset, and then use that > information in my high level consumer to detect when I've reached that a > message with that offset. Which is exactly what the SimpleConsumer example > does ( > https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example). > That involves finding the leader for the partition, etc etc. Not hard, but > a bunch of steps. > > I noticed that kafkacat has an option similar to what I'm looking for: > -e Exit successfully when last message received > > Looking at the code, it appears that a FetchRequest returns the > HighwaterMarkOffset mark for a partition, and the API docs confirm that: > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-FetchResponse > > Does the Java high-level consumer expose the HighwaterMarkOffset in any > way? I looked but I couldn't find such a thing. > > Thanks, > -James > >