Thanks Eno .. regarding the merging part, I was talking about merging topics using streams only - so that is safe as you mentioned.
Regarding the restore part, I have another question. May be it's a bit naive too .. During restore why does Kafka replay the whole topic / partition to recreate the state in the local state store ? Isn't there any way to just have the latest message as the current state ? Because that's what it is .. right ? The last message in the topic / partition IS the latest state. May be I am missing something obvious ? regards. On Fri, Jul 14, 2017 at 6:23 PM, Eno Thereska <eno.there...@gmail.com> wrote: > Hi Debasish, > > Your intuition about the first part is correct. Kafka Streams > automatically assigns a partition of a topic to > a task in an instance. It will never be the case that the same partition > is assigned to two tasks. > > About the merging or changing of partitions part, it would help if we know > more about what you > are trying to do. For example, if behind the scenes you add or remove > partitions that would not work > well with Kafka Streams. However, if you use the Kafka Streams itself to > create new topics (e.g., > by merging two topics into one, or vice versa by taking one topic and > splitting it into more topics), then > that would work fine. > > Eno > > > On 13 Jul 2017, at 23:49, Debasish Ghosh <ghosh.debas...@gmail.com> > wrote: > > > > Hi - > > > > I have a question which is mostly to clarify some conceptions regarding > > state management and restore functionality using Kafka Streams .. > > > > When I have multiple instances of the same application running (same > > application id for each of the instances), are the following assumptions > > correct ? > > > > 1. each instance has a separate state store (local) > > 2. all instances are backed up by a *single* changelog topic > > > > Now the question arises, how does restore work in the above case when we > > have 1 changelog topic backing up multiple state stores ? > > > > Each instance of the application ingests data from specific partitions of > > the topic. And there can be multiple topics too. e.g. if we have m topics > > with n partitions in each, and p instances of the application, then all > the > > (m x n) partitions are distributed across the p instances of the > > application. Is this true ? > > > > If so, then does the changelog topic also has (m x n) partitions, so that > > Kafka knows which state to restore in which store in case of a restore > > operation ? > > > > And finally, if we decide to merge topics / partitions in between without > > complete reset of the application, will (a) it work ? and (b) the > changelog > > topic gets updated accordingly and (c) is this recommended ? > > > > regards. > > > > -- > > Debasish Ghosh > > http://manning.com/ghosh2 > > http://manning.com/ghosh > > > > Twttr: @debasishg > > Blog: http://debasishg.blogspot.com > > Code: http://github.com/debasishg > > -- Debasish Ghosh http://manning.com/ghosh2 http://manning.com/ghosh Twttr: @debasishg Blog: http://debasishg.blogspot.com Code: http://github.com/debasishg