Re: Few questions about how Kafka Streams manages tasks

2017-08-17 Thread Anish Mashankar
Yes. I'm doing that. But in this case the container orchestration system is killing it due to OOM or loss of node (I am simulating various failure scenarios). But yeah, the fact that kafka streams only commits offsets of messages that have been through the entire pipeline kind has enabled me to tak

Re: Few questions about how Kafka Streams manages tasks

2017-08-16 Thread Guozhang Wang
I see. For normal maintenance operations, before you kill your container you could shuts down the Streams application by calling `KafkaStreams#close()`. Upon shutting down it would write a local checkpoint file indicating at which point in terms of offsets it has stopped at. So on resuming if the

Re: Few questions about how Kafka Streams manages tasks

2017-08-15 Thread Anish Mashankar
Hi Guozhang, Thanks for the reply. By taking a lot of time I meant that I see a log message `Restoring state from changelog topics `, followed by just some kafka consumer logs like `Discovered coordinator`. Looking at this I assumed that the Stream threads are waiting for the states to be r

Re: Few questions about how Kafka Streams manages tasks

2017-08-15 Thread Guozhang Wang
Hi Anish, 1. The committed offsets are for the messages that have been through the entire pipeline. Plus when we are committing, we make sure all state store caches are flushed so there should be no messages that are "in the middle of the topology". If there is a failure before the commit, then so

Few questions about how Kafka Streams manages tasks

2017-08-14 Thread Anish Mashankar
First question: We know that Kafka Streams commits offsets on intervals. But what offsets are committed? Are the offsets for messages committed are the ones which have just arrived at the source node? Or the messages that have been through the entire pipeline? If the latter, how do we avoid data lo