Correct prefetching of data to KTable-like structure on application startup

Jan Lukavský Tue, 07 Feb 2017 03:47:05 -0800

Hi all,

I have a question how to do a correct caching in KTable-like structureon application startup. I'm not sure if this belongs to user or devmaillist, so sorry if I've chosen the bad one. What is my observation sofar:

- if I don't send any data to a kafka partition for a period longerthen the data retention interval, then all data from the partition iswiped out

- the index file is not cleared (which is obvious, it has to keeptrack of the next offset to assign to a new message)

In my scenario on startup, I want to read all data from a topic (or asubset of its partitions), wait until all the old data has been cachedand then start processing of a different stream (basically I'm doing ajoin of KStream and KTable, but I have implemented it manually due tosome special behavior). Now, what is the issue here - when the specificpartition doesn't get any message within the retention period, then Iend up stuck trying to prefetch data to the "KTable" - this is because Iget the offset of the last message (plus 1) from the broker, but I don'tget any data ever (until I send a message to the partition). The problemI see here is that kafka tells me what the last offset in a partitionis, but there is no upper bound on when a first message will arrive,even though I reset the offset and start reading from the beginning of apartition. My question is, is it a possibility not to clear the wholepartition, but to always keep at least the last message? That way, theclient would always get at least the last message, can therefore figureout it is at the end of the partition (reading the old data) and startprocessing. I believe that KTable implementation could have a verysimilar issue. Or is there any other way around? I could add a timeout,but this seems a little fragile.


Thanks in advance for any suggestions and opinions,

 Jan

Correct prefetching of data to KTable-like structure on application startup

Reply via email to