Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

Matthias J. Sax Tue, 29 Nov 2016 21:45:58 -0800

Thanks for your input.

To clarify: the main reason to add the metadata topic is to cope with
subtopologies that are connected via intermediate topic (either
user-defined via through() or internally created for data repartitioning).


Without this handling, the behavior would be odd and user experience
would be bad.

Thus, using the metadata topic for have a "fixed HW" is just a small
add-on -- and more or less for free, because the metadata topic is
already there.


-Matthias


On 11/29/16 7:53 PM, Neha Narkhede wrote:
> Thanks for initiating this. I think this is a good first step towards
> unifying batch and stream processing in Kafka.
> 
> I understood this capability to be simple yet very useful; it allows a
> Streams program to process a log, in batch, in arbitrary windows defined by
> the difference between the HW and the current offset. Basically, it
> provides a simple means for a Streams program to "stop" after processing a
> batch, stop (just like a batch program would) and continue where it left
> off when restarted. In other words, it allows batch processing behavior for
> a Streams app without code changes.
> 
> This feature is useful but I do not think there is a necessity to add a
> metadata topic. After all, the user doesn't really care as much about
> exactly where the batch ends. This feature allows an app to "process as
> much as there is data to process" and the way it determines how much data
> there is to process is by reading the HW on startup. If there is new data
> written to the log right after it starts up, it will process it when
> restarted the next time. If it starts, reads HW but fails, it will restart
> and process a little more before it stops again. The fact that the HW
> changes in some scenarios isn't an issue since a batch program that behaves
> this way doesn't really care exactly what that HW is.
> 
> There might be cases which require adding more topics but I would shy away
> from adding complexity wherever possible as it complicates operations and
> reduces simplicity.
> 
> Other than this issue, I'm +1 on adding this feature. I think it is pretty
> powerful.
> 
> 
> On Mon, Nov 28, 2016 at 10:48 AM Matthias J. Sax <matth...@confluent.io>
> wrote:
> 
>> Hi all,
>>
>> I want to start a discussion about KIP-95:
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams
>>
>> Looking forward to your feedback.
>>
>>
>> -Matthias
>>
>>
>> --
> Thanks,
> Neha
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

Reply via email to