Hi Kafka developers, *The question:* How can I randomly fetch an old chunk of messages with a given range definition of [partition, start offset, end offset]. Hopefully ranges from multiple partitions at once (one range for each partition). This needs to be supported in a concurrent environment too.
*My ideas for solution so far* I guess I can use a pool of consumers for the concurrency, and for each fetch, use Consumer.seek and Consumer.poll with max.poll.records. But this seems wrong. No promise that I will get the same exact chunk, for example in a case when a message get deleted (using log compact). As a whole this seek + poll method not seems like the right fit for one time random fetch. *My use case:* Like the typical consumer, mine reads 10MB chunks of messages and processes it. In order to process that chunk I am pushing 3-20 jobs to different topics, in some kind of workflow. Now, my goal is to avoid pushing the same chunk into the other topics again and again. Seems to me that it is better to push a reference to that chunk. e.g. [Topic X / partition Y, start offset, end offset]. Then, on the processing of the jobs, it will fetch the exact chunk again. I also posted a question in SO - https://stackoverflow.com/q/55950565/1265306 -- Thanks, Nitzan