Thanks for your input. To clarify: the main reason to add the metadata topic is to cope with subtopologies that are connected via intermediate topic (either user-defined via through() or internally created for data repartitioning).
Without this handling, the behavior would be odd and user experience would be bad. Thus, using the metadata topic for have a "fixed HW" is just a small add-on -- and more or less for free, because the metadata topic is already there. -Matthias On 11/29/16 7:53 PM, Neha Narkhede wrote: > Thanks for initiating this. I think this is a good first step towards > unifying batch and stream processing in Kafka. > > I understood this capability to be simple yet very useful; it allows a > Streams program to process a log, in batch, in arbitrary windows defined by > the difference between the HW and the current offset. Basically, it > provides a simple means for a Streams program to "stop" after processing a > batch, stop (just like a batch program would) and continue where it left > off when restarted. In other words, it allows batch processing behavior for > a Streams app without code changes. > > This feature is useful but I do not think there is a necessity to add a > metadata topic. After all, the user doesn't really care as much about > exactly where the batch ends. This feature allows an app to "process as > much as there is data to process" and the way it determines how much data > there is to process is by reading the HW on startup. If there is new data > written to the log right after it starts up, it will process it when > restarted the next time. If it starts, reads HW but fails, it will restart > and process a little more before it stops again. The fact that the HW > changes in some scenarios isn't an issue since a batch program that behaves > this way doesn't really care exactly what that HW is. > > There might be cases which require adding more topics but I would shy away > from adding complexity wherever possible as it complicates operations and > reduces simplicity. > > Other than this issue, I'm +1 on adding this feature. I think it is pretty > powerful. > > > On Mon, Nov 28, 2016 at 10:48 AM Matthias J. Sax <matth...@confluent.io> > wrote: > >> Hi all, >> >> I want to start a discussion about KIP-95: >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams >> >> Looking forward to your feedback. >> >> >> -Matthias >> >> >> -- > Thanks, > Neha >
signature.asc
Description: OpenPGP digital signature