Sorry for answering late. The mapping from partitions to threads also depend on the structure of your topology. As you mention that you have a quite complex one, I assume that this is the reason for the uneven distribution. I you want to dig deeper, it would be helpful to know the structure of your topology.
-Matthias On 1/9/17 12:05 PM, Ara Ebrahimi wrote: > I meant I have 7 topics and each has 12 partitions. Considering that I have 4 > streaming threads per node, I was expecting to see each thread process 1 > partition from each topics and 7 partitions total per streaming thread. But > that’s not the case. Or perhaps you are saying the number of streaming > threads should follow the total number of partitions across all 7 topics?! > > Ara. > >> On Jan 9, 2017, at 11:48 AM, Michael Noll <mich...@confluent.io> wrote: >> >> What does the processing topology of your Kafka Streams application look >> like, and what's the exact topic and partition configuration? You say you >> have 12 partitions in your cluster, presumably across 7 topics -- that >> means that most topics have just a single partition. Depending on your >> topology (e.g. if you have defined that single-partition topics A, B, C >> must be joined), Kafka Streams is forced to let one of your three Streams >> nodes process "more" topics/partitions than the other two nodes. >> >> -Michael >> >> >> >> On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com> >> wrote: >> >>> Hi, >>> >>> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3 >>> kafka streams nodes. Each is configured to have 4 streaming threads. My >>> topology is quite complex and I have 7 topics and lots of joins and states. >>> >>> What I have noticed is that each of the 3 kafka streams nodes gets >>> configured to process variables number of partitions of a topic. One node >>> is assigned to process 2 partitions of topic a and another one gets >>> assigned 5. Hence I end up with nonuniform throughput across these nodes. >>> One node ends up processing more data than the other. >>> >>> What’s going on? How can I make sure partitions assignment to kafka >>> streams nodes is uniform? >>> >>> On a similar topic, is there a way to make sure partition assignment to >>> disks across kafka brokers is also uniform? Even if I use a round-robin one >>> to pin partitions to broker, but there doesn’t seem to be a way to >>> uniformly pin partitions to disks. Or maybe I’m missing something here? I >>> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a >>> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly >>> distributed either. >>> >>> Ara. >>> >>> >>> >>> ________________________________ >>> >>> This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise confidential information. If you have >>> received it in error, please notify the sender immediately and delete the >>> original. Any other use of the e-mail by you is prohibited. Thank you in >>> advance for your cooperation. >>> >>> ________________________________ >>> >> >> >> >> ________________________________ >> >> This message is for the designated recipient only and may contain >> privileged, proprietary, or otherwise confidential information. If you have >> received it in error, please notify the sender immediately and delete the >> original. Any other use of the e-mail by you is prohibited. Thank you in >> advance for your cooperation. >> >> ________________________________ > > > > > ________________________________ > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise confidential information. If you have received it > in error, please notify the sender immediately and delete the original. Any > other use of the e-mail by you is prohibited. Thank you in advance for your > cooperation. > > ________________________________ >
signature.asc
Description: OpenPGP digital signature