Kafka Streams: retention and stream replay

Dmitry Minkovsky Mon, 07 Aug 2017 08:43:39 -0700

One of the most appealing features of the streams-based architecture is the
ability to replay history. This concept was highlighted in a blog post
[0] just the other day.


Practically, though, I am stuck on the mechanics of replaying data when
that data is also periodically expiring. If your logs expire after some
time, how can you replay state? This may not be a problem for certain kinds
of analysis, especially windowed analysis.

However, lets say your retention topic consists of logical application
events like "user-create" and "user-update". If the "user-create" event is
deleted, subsequent "user-update" events for that user are no longer
replayable. The streams applications transforms "user-create" and
"user-update" events into a compacted entity topic "user". This topic can
be replayed, but that is different from replaying the actual events that
produced the compacted entity.

So how do I make sense of retention and replay?

Thank you,
Dmitry




[0] https://www.confluent.io/blog/messaging-single-source-truth/

Kafka Streams: retention and stream replay

Reply via email to