I'd like to second the discouragement of adding a new topic per job. We went down this path in Samza and I think the result was quite a mess. You had to read the full topic every time a job started and so it added a lot of overhead and polluted the topic space.
What if we did the following: 1. Use timestamp instead of offset 2. Store the "stopping timestamp" in the metadata field associated with the existing offset storage mechanism 3. Don't worry about fully processing the entire DAG. After all, partially processing a tuple isn't much different from not processing it, and in any case the stopping point is a heuristic so no point in being overly precise here. Probably I'm missing something, though, I haven't thought through the implications of using time instead of offset. -Jay On Mon, Nov 28, 2016 at 10:47 AM, Matthias J. Sax <matth...@confluent.io> wrote: > Hi all, > > I want to start a discussion about KIP-95: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 95%3A+Incremental+Batch+Processing+for+Kafka+Streams > > Looking forward to your feedback. > > > -Matthias > > >