Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-10-14 Thread Pushkar Deole
Matthias, Any recommendations? Also, while doing performance test, I observed that the partitions assigned to stream threads are changing. Why would this happen when the instances are not going down? e.g. i see partitions being changed between stream thread consumers. The highlighted in bold are

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-10-08 Thread Pushkar Deole
I looked at the task assignment and it looked random for some threads: e.g. i have 3 topics 24 partitions each and have 3 instances of application. So, each instance assigned 8 partitions per topic, i.e. total 24 partitions for 3 topics. When I set 8 stream threads, I expected each thread to be as

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-10-07 Thread Matthias J. Sax
Well, there are many what-ifs and I am not sure if there is general advice. Maybe a more generic response: do you actually observe a concrete issue with the task assignment that impacts your app measurable? Or might this be a case of premature optimization? -Matthias On 10/6/20 10:13 AM, Pushkar

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-10-06 Thread Pushkar Deole
So, what do you suggest to address the topic C with lesser traffic? Should we create a separate StreamBuilder and build a separate topology for topic C so we can configure number of threads as per our requirement for that topic? On Tue, Oct 6, 2020 at 10:27 PM Matthias J. Sax wrote: > The curren

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-10-06 Thread Matthias J. Sax
The current assignment would be "round robin". Ie, after all tasks are created, we just take task-by-task and assign one to the threads one-by-one. Note though, that the assignment algorithm might change at any point, so you should not rely on it. We are also not able to know if one topic has les

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-10-06 Thread Pushkar Deole
Matthias, I am just wondering how the tasks will be spread across threads in case I have lesser threads than the number of partitions. Specifically taking my use case, I have 3 inputs topics with 8 partitions each and I can configure 12 threads, so how below topics partitions will be distributed a

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-09-04 Thread Matthias J. Sax
That is correct. If topicA has 5 partitions and topicB has 6 partitions, you get 5 tasks for the first sub-topology and 6 tasks for the second sub-topology and you can run up to 11 threads, each executing one task. -Matthias On 9/4/20 1:30 AM, Pushkar Deole wrote: > Matthias, > > Let's say we

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-09-04 Thread Pushkar Deole
Matthias, Let's say we have independent sub topologies like: in this case, will the streams create tasks equal to the total number of partitions from topicA and topicB, and can we assign stream thread count that is sum of the partition of the two topics? builder.stream("topicA").filter().to(); bu

Re: Kafka streams parallelism - why not separate stream task per partition per input topic

2020-09-03 Thread Matthias J. Sax
Well, it depends on your program. The reason for the current task creating strategy are joins: If you have two input topic that you want to join, the join happens on a per-partition basis, ie, topic-p0 is joined to topicB-p0 etc and thus both partitions must be assigned to the same task (to get co

Kafka streams parallelism - why not separate stream task per partition per input topic

2020-09-02 Thread Pushkar Deole
Hi, I came across articles where it is explained how parallelism is handled in kafka streams. This is what I collected: When the streams application is reading from multiple topics, the topic with maximum number of partitions is considered for instantiating stream tasks so 1 task is instantiated p