Hello Dmitry, The right way to think of reprocessing is that:
1) you can reprocess the source stream from a given Kafka topic only within the source topic's retention period. For example, if an event happens and produced to the source topic at time t0, and that topic retention period is t1, then that event can be reprocessed until t0 + t1. 2) if your input topic's messages are not independent, i.e. it is not an append-only record stream but messages are logically dependent to each other (you can read about this in https://kafka.apache.org/0110/documentation/streams/developer-guide#streams_kstream_ktable), then you cannot logically reprocess it at ANY given point in history. For example, if your input source stream is actually a changelog-like stream, such that a "user-create" event happens at t0, and a "user-update" event happens at t2, then if you reprocess at a time t1 where t0 < t1 < t2, you are restarting in an "inconsistent" state of the source stream. So the right way is to configure your input source topic to not use time-based retention but use key-based compaction policy. To prevent it from growing indefinitely, we are considering some point configurable for retention ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy) that users can specify up to which point the processing have been done and final, and it will never need to be re-processed again so that brokers can now truncate up to that point. Guozhang On Mon, Aug 7, 2017 at 8:42 AM, Dmitry Minkovsky <dminkov...@gmail.com> wrote: > One of the most appealing features of the streams-based architecture is the > ability to replay history. This concept was highlighted in a blog post > [0] just the other day. > > Practically, though, I am stuck on the mechanics of replaying data when > that data is also periodically expiring. If your logs expire after some > time, how can you replay state? This may not be a problem for certain kinds > of analysis, especially windowed analysis. > > However, lets say your retention topic consists of logical application > events like "user-create" and "user-update". If the "user-create" event is > deleted, subsequent "user-update" events for that user are no longer > replayable. The streams applications transforms "user-create" and > "user-update" events into a compacted entity topic "user". This topic can > be replayed, but that is different from replaying the actual events that > produced the compacted entity. > > So how do I make sense of retention and replay? > > Thank you, > Dmitry > > > > > [0] https://www.confluent.io/blog/messaging-single-source-truth/ > -- -- Guozhang