Kafka Consumer - Random access fetching of partition range

Nitzan Aviram Thu, 02 May 2019 03:51:35 -0700

Hi Kafka developers,

*The question:* How can I randomly fetch an old chunk of messages with a
given range definition of [partition, start offset, end offset]. Hopefully
ranges from multiple partitions at once (one range for each partition).
This needs to be supported in a concurrent environment too.



*My ideas for solution so far* I guess I can use a pool of consumers for
the concurrency, and for each fetch, use Consumer.seek and Consumer.poll
 with max.poll.records. But this seems wrong. No promise that I will get
the same exact chunk, for example in a case when a message get deleted
(using log compact). As a whole this seek + poll method not seems like the
right fit for one time random fetch.

*My use case:* Like the typical consumer, mine reads 10MB chunks of
messages and processes it. In order to process that chunk I am pushing 3-20
jobs to different topics, in some kind of workflow. Now, my goal is to
avoid pushing the same chunk into the other topics again and again. Seems
to me that it is better to push a reference to that chunk. e.g. [Topic X /
partition Y, start offset, end offset]. Then, on the processing of the
jobs, it will fetch the exact chunk again.
I also posted a question in SO -
https://stackoverflow.com/q/55950565/1265306

-- 
Thanks,
Nitzan

Kafka Consumer - Random access fetching of partition range

Reply via email to