I'd like to second the discouragement of adding a new topic per job. We
went down this path in Samza and I think the result was quite a mess. You
had to read the full topic every time a job started and so it added a lot
of overhead and polluted the topic space.

What if we did the following:

   1. Use timestamp instead of offset
   2. Store the "stopping timestamp" in the metadata field associated with
   the existing offset storage mechanism
   3. Don't worry about fully processing the entire DAG. After all,
   partially processing a tuple isn't much different from not processing it,
   and in any case the stopping point is a heuristic so no point in being
   overly precise here.

Probably I'm missing something, though, I haven't thought through the
implications of using time instead of offset.

-Jay

On Mon, Nov 28, 2016 at 10:47 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Hi all,
>
> I want to start a discussion about KIP-95:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 95%3A+Incremental+Batch+Processing+for+Kafka+Streams
>
> Looking forward to your feedback.
>
>
> -Matthias
>
>
>

Reply via email to