Well, it depends on your program. The reason for the current task creating strategy are joins: If you have two input topic that you want to join, the join happens on a per-partition basis, ie, topic-p0 is joined to topicB-p0 etc and thus both partitions must be assigned to the same task (to get co-partitioned data processed together).
Note, that the following program would create independent tasks as it consist of two independent sub-topologies: builder.stream("topicA").filter().to(); builder.stream("topicB").filter().to(); However, the next program would be one sub-topology and thus we apply the "join" rule (as we don't really know if you actually execute a join or not when we create tasks): KStream s1 = builder.stream("topicA"); builser.stream("topicB").merge(s1).filter().to(); Having said that, I agree that it would be a nice improvement to be more clever about it. However, it not easy to do. There is actually a related ticket: https://issues.apache.org/jira/browse/KAFKA-9282 Hope this helps. -Matthias On 9/2/20 11:09 PM, Pushkar Deole wrote: > Hi, > > I came across articles where it is explained how parallelism is handled in > kafka streams. This is what I collected: > When the streams application is reading from multiple topics, the topic > with maximum number of partitions is considered for instantiating stream > tasks so 1 task is instantiated per partition. > Now, if the stream task is reading from multiple topics then the partitions > of multiple topics are shared among those stream tasks. > > For example, Topic A and B has 5 partitions each then 5 tasks are > instantiated and assigned to 5 stream threads where each task is assigned 1 > partition from Topic A and Topic B. > > The question here is : if I would want 1 task to be created for each > partition from the input topic then is this possible? e.g. I would want to > have 5 tasks for topic A and 5 for B and then would want 10 threads to > handle those. How can this be achieved? >
signature.asc
Description: OpenPGP digital signature