Hi Karthik, I think in the current trunk we do effectively load balance across processes (they are named as "clients" in the partition assignor) already. More specifically:
1. Consumer clients embedded a "client UUID" in its subscription so that the leader can group them into a single client, whose "capacity" is the number of threads it has. 2. Suppose there are N tasks in total, and M capacities (i.e. M num.total.threads): then for each client with a capacity of m, it will likely to get (N / M) * m tasks no matter if N > M or N < M. 3. So in the case of N < M even, say in the above example N = 9 and M = 28, each client should have 9 / 28 * 14 = 4.5 tasks. You could try to build your app from Kafka trunk and see if this is the case in your scenario. Never the less, Matthias point is still valid that we do not recommend you ever have N < M since it will result in idle threads. Guozhang On Tue, Mar 21, 2017 at 4:56 PM, Matthias J. Sax <matth...@confluent.io> wrote: > Hi, > > I guess, it's currently not possible to load balance between different > machines. It might be a nice optimization to add into Streams though. > > Right now, you should reduce the number of threads. Load balancing is > based on threads, and thus, if Streams place tasks to all threads of one > machine, it will automatically assign the remaining tasks to thread of > the second machine. > > Btw: If you have only 9 input partitions, you will get most likely 9 > tasks (might be more, depending on your topology structure) and thus, > you cannot utilize more then 9 thread anyway. Thus, running with 28 > thread will most likely result in many idle threads. > > See the docs for more details: > > - > http://docs.confluent.io/current/streams/architecture. > html#parallelism-model > - > http://docs.confluent.io/current/streams/architecture.html#threading-model > > > > -Matthias > > On 3/21/17 3:40 PM, Prasad, Karthik wrote: > > Hey, > > > > I have a typical scenario of a kafka-streams application in a production > environment. > > > > We have a kafka-cluster with multiple topics. Messages from one topic is > being consumed by a the kafka-streams application. The topic, currently, > has 9 partitions. We have configured consumer thread count to 14. We are > running 2 instances of this stream application on 2 different machines, > thereby consisting of 28 threads across both machines. The group id for the > consumers are the same. But, what I observe is that all partitions are > being assigned to threads on a single machine. Now, I do understand that if > the task on the active machine fails, then the threads in the other machine > would take over. My question is that is there a way that kafka-streams can > auto-balance across instances of the same stream application ? If yes, how > do I go about doing that ? Please let me know. Thanks, > > > > Best, > > Karthik Prasad > > Senior Software Engineer > > Sony Interactive Entertainment > > > > > > > > -- Thanks, Guozhang *Guozhang Wang | Software Engineer | Confluent | +1 607.339.8352 <607.339.8352> * On Tue, Mar 21, 2017 at 4:56 PM, Matthias J. Sax <matth...@confluent.io> wrote: > Hi, > > I guess, it's currently not possible to load balance between different > machines. It might be a nice optimization to add into Streams though. > > Right now, you should reduce the number of threads. Load balancing is > based on threads, and thus, if Streams place tasks to all threads of one > machine, it will automatically assign the remaining tasks to thread of > the second machine. > > Btw: If you have only 9 input partitions, you will get most likely 9 > tasks (might be more, depending on your topology structure) and thus, > you cannot utilize more then 9 thread anyway. Thus, running with 28 > thread will most likely result in many idle threads. > > See the docs for more details: > > - > http://docs.confluent.io/current/streams/architecture. > html#parallelism-model > - > http://docs.confluent.io/current/streams/architecture.html#threading-model > > > > -Matthias > > On 3/21/17 3:40 PM, Prasad, Karthik wrote: > > Hey, > > > > I have a typical scenario of a kafka-streams application in a production > environment. > > > > We have a kafka-cluster with multiple topics. Messages from one topic is > being consumed by a the kafka-streams application. The topic, currently, > has 9 partitions. We have configured consumer thread count to 14. We are > running 2 instances of this stream application on 2 different machines, > thereby consisting of 28 threads across both machines. The group id for the > consumers are the same. But, what I observe is that all partitions are > being assigned to threads on a single machine. Now, I do understand that if > the task on the active machine fails, then the threads in the other machine > would take over. My question is that is there a way that kafka-streams can > auto-balance across instances of the same stream application ? If yes, how > do I go about doing that ? Please let me know. Thanks, > > > > Best, > > Karthik Prasad > > Senior Software Engineer > > Sony Interactive Entertainment > > > > > > > > -- -- Guozhang