Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

Matthias J. Sax Mon, 05 Dec 2016 09:57:07 -0800

1) The change to consume the metadata topic by all instances should not
be big. We want to leverage the "restore consumer" that does manual
partitions assignments already.

2) I understand your concern about adding the metadata topic... For a
single instance with one thread, it would be easy to go in-memory -- for
multi-threaded single instance, it would also be possible, even if we
need to do thread synchronization. However, going in-memory solves only
the issue with intermediate topics -- not the "moving EOL" is case of
failure. IMHO, having fixed "stop offsets" is already worth to save it
reliable (for single instance, we could go to disk though).

So, strictly speaking we do not need to persists the data in a topic, we
could also implement our own network broadcast (the required host
information is already there via IQ feature). Thus, the leader would
send the information via an extra network connection and all instances
save it to disk. For intermediate topics, all running instances must
broadcast the "stop offsets" to all other during runtime, too. Overall,
I think this would not simplify the solution compared to using a topic.

However, I cannot follow here:

> I understand the need if we have an application with multiple instances on 
> different servers
> (but I still don't think we need to handle that case).

Why do you think we do not need to handle the distributed case? Isn't
this the most relevant one? I would assume that only a minority of
application will be single instance (even in a single instance might be
good enough from a performance point of view, I guess people tend to
start at least two for fault-tolerance).

-Matthias

On 12/4/16 2:45 AM, Eno Thereska wrote:
> A couple of remaining questions:
> 
> - it says in the KIP: "because the metadata topic must be consumed by all 
> instances, we need to assign the topic’s partitions manually and do not 
> commit offsets -- we also need to seekToBeginning() each time we consume the 
> metadata topic)" . How big of a change is allowing a topic to be consumed by 
> all instances?
> 
> - for the case when we have a single instance, with intermediate topics etc, 
> could we keep the data that we want to persist in the metadata topic in 
> memory instead? What is the advantage of persisting this data? I understand 
> the need if we have an application with multiple instances on different 
> servers (but I still don't think we need to handle that case). Is there a 
> need to persist data for a single instance? It would help if we enumerate the 
> exact failure scenarios and how persisting the data helps. So I think you 
> convinced me that this metadata is useful in the previous email, now I'm 
> asking if it needs persisting.
> 
> I'm really trying to avoid having the metadata topic. It's one more topic 
> that needs to be kept around and maintained carefully with all failure cases 
> considered. With EoS around the corner introducing its own internal topic(s), 
> and atomicity when writing to multiple topics, in my mind there is real value 
> if we can have a solution without an extra topic for now.
> 
> 
> Thanks
> Eno
> 
> 
> 
>> On 28 Nov 2016, at 18:47, Matthias J. Sax <matth...@confluent.io> wrote:
>>
>> Hi all,
>>
>> I want to start a discussion about KIP-95:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-95%3A+Incremental+Batch+Processing+for+Kafka+Streams
>>
>> Looking forward to your feedback.
>>
>>
>> -Matthias
>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-95: Incremental Batch Processing for Kafka Streams

Reply via email to