Streams does use one changelog topic per store (not just a single global changelog topic per application). Thus, the number of partitions can be different for different stores/changelog topics within one application.
About partitions assignment: It depends a little bit on the structure of your program (ie, the DAG structure). But in general, partitions of different topics are co-located. Assume you have 2 input topic T1 and T2 with 3 partitions each and 2 application instances: Instance 1 would get T1-0 and T2-0 assigned and instance 2 would get T1-1 and T2-1 assigned. The remaining partitions T1-2 and T2-2 might be on either instance (so either both on instance 1 or both on instance 2). For this case, the changelog topic would have 3 partitions (same as the input topics). About modifying the input topics: This is not allowed and will break your application. After a changelog topic got created, it will not be modified anymore. Thus, if you change the number of input topic partitions, it does not match the number of changelog topic partitions and Streams will raise an exception. Using the reset tool is mandatory for this case to fix it. This blog post gives more details: https://www.confluent.io/blog/data-reprocessing-with-kafka-streams-resetting-a-streams-application/ -Matthias On 7/14/17 5:53 AM, Eno Thereska wrote: > Hi Debasish, > > Your intuition about the first part is correct. Kafka Streams automatically > assigns a partition of a topic to > a task in an instance. It will never be the case that the same partition is > assigned to two tasks. > > About the merging or changing of partitions part, it would help if we know > more about what you > are trying to do. For example, if behind the scenes you add or remove > partitions that would not work > well with Kafka Streams. However, if you use the Kafka Streams itself to > create new topics (e.g., > by merging two topics into one, or vice versa by taking one topic and > splitting it into more topics), then > that would work fine. > > Eno > >> On 13 Jul 2017, at 23:49, Debasish Ghosh <ghosh.debas...@gmail.com> wrote: >> >> Hi - >> >> I have a question which is mostly to clarify some conceptions regarding >> state management and restore functionality using Kafka Streams .. >> >> When I have multiple instances of the same application running (same >> application id for each of the instances), are the following assumptions >> correct ? >> >> 1. each instance has a separate state store (local) >> 2. all instances are backed up by a *single* changelog topic >> >> Now the question arises, how does restore work in the above case when we >> have 1 changelog topic backing up multiple state stores ? >> >> Each instance of the application ingests data from specific partitions of >> the topic. And there can be multiple topics too. e.g. if we have m topics >> with n partitions in each, and p instances of the application, then all the >> (m x n) partitions are distributed across the p instances of the >> application. Is this true ? >> >> If so, then does the changelog topic also has (m x n) partitions, so that >> Kafka knows which state to restore in which store in case of a restore >> operation ? >> >> And finally, if we decide to merge topics / partitions in between without >> complete reset of the application, will (a) it work ? and (b) the changelog >> topic gets updated accordingly and (c) is this recommended ? >> >> regards. >> >> -- >> Debasish Ghosh >> http://manning.com/ghosh2 >> http://manning.com/ghosh >> >> Twttr: @debasishg >> Blog: http://debasishg.blogspot.com >> Code: http://github.com/debasishg >
signature.asc
Description: OpenPGP digital signature