Hi Anish, 1. The committed offsets are for the messages that have been through the entire pipeline. Plus when we are committing, we make sure all state store caches are flushed so there should be no messages that are "in the middle of the topology". If there is a failure before the commit, then some records may be re-fetched and re-processed, but no data-loss (i.e. at-least-once). In 0.11.0 we add the exactly-once processing support to avoid such duplicate message scenarios.
2. To avoid bootstrapping the state from the entire changelog, users can set the number of standby-tasks to be non-zero. Standy-tasks simply drains the changelog other active tasks are producing to, and build the local state store images continuously. When previous active task crashes, the library will try to migrate it to the hosts that have the corresponding standby tasks if there are any. So that on these hosts the library do not need to read from the beginning of the changelog to restore. Feel free to read more details here: https://kafka.apache.org/0110/documentation/streams/architecture#streams_architecture_recovery 3. "But this is taking a lot of time as the topics are replayed." I'm not sure I understand this question. Do you mean that the apps never commit offsets, so whenever they are resumed they always fetch from the beginning of the source topics? Guozhang On Mon, Aug 14, 2017 at 9:55 PM, Anish Mashankar <an...@systeminsights.com> wrote: > First question: We know that Kafka Streams commits offsets on intervals. > But what offsets are committed? Are the offsets for messages committed are > the ones which have just arrived at the source node? Or the messages that > have been through the entire pipeline? If the latter, how do we avoid data > loss in this case? Is there an equivalent to Connect's RetriableException > in Kafka Streams? > ---- > Second question: Does giving an application host:port allows Kafka Streams > instances to communicate the data of state stores so that the entire > changelog topic is not read every time a crash happens? > ---- > Third question: If I want to upgrade the application using some kind of > code changes and bug fixes, how should the upgrade pattern proceed? Right > now, I just kill all the containers and bring new ones up. But this is > taking a lot of time as the topics are replayed. How can I do it faster? > Should I upgrade the processes slowly? > -- > > Regards, > Anish Samir Mashankar > R&D Engineer > System Insights > +91-9789870733 > -- -- Guozhang