We're running Kafka 0.7 and I'm hitting some issues trying to access the
newest n messages in a topic (or at least in a broker/partition combo) and
wondering if my use case just isn't supported or if I'm missing something.
 What I'd like to be able to do is get the most recent offset from a
broker/partition combo, subtract an amount of bytes roughly equivalent to
messages_desired*bytes_per_message and then issue a FetchRequest with that
offset and amount of bytes.

I gathered from this
post<http://mail-archives.apache.org/mod_mbox/kafka-users/201212.mbox/%3cccf8f23d.5e4a%25zhaoyong...@gmail.com%3E>
that
I need to use the Simple Consumer in order to do offset manipulation beyond
the start from beginning and start from end options.  And I saw from this
post<http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201209.mbox/%3ccald69j0idczzff3nm-wrfvw5y6wwxrzfol8a1qqfugqukdo...@mail.gmail.com%3E>
that
the offsets returned by getOffsetsBefore are really only the major
checkpoints when files are rolled over, every 500MB by default.  I also
found that if I take an offset returned from getOffsetsBefore and subtract
a fixed value, say 100KB, and submit that offset with a FetchRequest I get
a kafka.common.InvalidMessageSizeException, presumably since my computed
offset didn't align with a real message offset.

As far as I can tell, this leaves me only able to find the most recent
milestone offset, perhaps up to 500MB behind current data, and extract a
batch from that point forward. Is there any other way that I'm missing
here? The two things that seem to be lacking are access to the most recent
offset and the ability to rollback from that offset by a fixed amount of
bytes or messages without triggering the InvalidMessageSizeException.

Thanks,
Shane

Reply via email to