kafka-streams folder work?

Peter Levart Fri, 28 Dec 2018 04:42:33 -0800

Hi Matthias,

Just a couple of questions about that...


On 12/27/18 9:57 PM, Matthias J. Sax wrote:

All data is backed in the Kafka cluster. Data that is stored locally, is
basically a cache, and Kafka Streams will recreate the local data if you
loose it.

Thus, I am not sure how the KTable data could be stale. One possibility
might be a miss-configuration: I assume that you read the topic directly
as a table (ie, builder.table("topic")). If you do this, the used input
topic must be configured with log compaction --- if it is configured
with retention, you might loose data from the input topic and if you
also loose the local cache, Kafka Streams cannot recreate the local
state because it was deleted from the topic (log compaction will guard
the input topic from data loss).

Is it really necessary to keep the whole log topic of a particular localstore? Such log will grow indefinitely. Replay of such log will takemore and more time. Do kafka streams write just changelog to such topicor do they eventually write a snapshot of the current store too?

If I configure 'num.standby.replicas' with number > 0, will suchreplicas keep its own synchronized local store on disk? In that caseloosing an active stream processor will fallback to standby processorand the log will only be replayed from the offset that has not beenapplied yet to the local store of the stand-by processor, meaning thatwe don't need to keep the whole log. But we need to be sure not to looseall local stores of all stand-by and active processor(s) then. In case astand-by processor that has been chosen to be promoted to active roledoes not have local store any more, the whole log will be needed again,right?

Suppose that the store that is needed is a WindowStore which only keepsdata for a limited number of past windows. Would the "useable" part ofsuch store be possible to reconstruct from the limited number of pastlog records so that full log would not be necessary?


Regards, Peter



-Matthias


On 12/24/18 12:22 PM, Edmondo Porcu wrote:

Hello Kafka users,

we are running a Kafka Streams as a fully stateless application, meaning
that we are not persisting /tmp/kafka-streams on a durable volume but we
are rather losing it at each restart. This application is performing a
KTable-KTable join of data coming from Kafka Connect, and sometimes we want
to force the output to tick so we update records in the right table from
the database, but we see that the left table is "stale".

Is it possible that because of reboots, the application loses some messages
? How is the state reconstructed when /tmp/kafka-streams is not available?
Is the state saved in an intermediate topic?

Thanks,
Edmondo

Re: How does the /tmp/kafka-streams folder work?

Reply via email to