btw. in case you didn't find out yet (I just discovered this...), you can get the entire topology by starting the stream, waiting a bit and then printing "KafkaStreams.toString()" to console.
I found it useful and cool :) On Tue, Jan 17, 2017 at 3:19 PM, Matthias J. Sax <matth...@confluent.io> wrote: > Sorry for answering late. > > The mapping from partitions to threads also depend on the structure of > your topology. As you mention that you have a quite complex one, I > assume that this is the reason for the uneven distribution. I you want > to dig deeper, it would be helpful to know the structure of your topology. > > > -Matthias > > On 1/9/17 12:05 PM, Ara Ebrahimi wrote: >> I meant I have 7 topics and each has 12 partitions. Considering that I have >> 4 streaming threads per node, I was expecting to see each thread process 1 >> partition from each topics and 7 partitions total per streaming thread. But >> that’s not the case. Or perhaps you are saying the number of streaming >> threads should follow the total number of partitions across all 7 topics?! >> >> Ara. >> >>> On Jan 9, 2017, at 11:48 AM, Michael Noll <mich...@confluent.io> wrote: >>> >>> What does the processing topology of your Kafka Streams application look >>> like, and what's the exact topic and partition configuration? You say you >>> have 12 partitions in your cluster, presumably across 7 topics -- that >>> means that most topics have just a single partition. Depending on your >>> topology (e.g. if you have defined that single-partition topics A, B, C >>> must be joined), Kafka Streams is forced to let one of your three Streams >>> nodes process "more" topics/partitions than the other two nodes. >>> >>> -Michael >>> >>> >>> >>> On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3 >>>> kafka streams nodes. Each is configured to have 4 streaming threads. My >>>> topology is quite complex and I have 7 topics and lots of joins and states. >>>> >>>> What I have noticed is that each of the 3 kafka streams nodes gets >>>> configured to process variables number of partitions of a topic. One node >>>> is assigned to process 2 partitions of topic a and another one gets >>>> assigned 5. Hence I end up with nonuniform throughput across these nodes. >>>> One node ends up processing more data than the other. >>>> >>>> What’s going on? How can I make sure partitions assignment to kafka >>>> streams nodes is uniform? >>>> >>>> On a similar topic, is there a way to make sure partition assignment to >>>> disks across kafka brokers is also uniform? Even if I use a round-robin one >>>> to pin partitions to broker, but there doesn’t seem to be a way to >>>> uniformly pin partitions to disks. Or maybe I’m missing something here? I >>>> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a >>>> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly >>>> distributed either. >>>> >>>> Ara. >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> This message is for the designated recipient only and may contain >>>> privileged, proprietary, or otherwise confidential information. If you have >>>> received it in error, please notify the sender immediately and delete the >>>> original. Any other use of the e-mail by you is prohibited. Thank you in >>>> advance for your cooperation. >>>> >>>> ________________________________ >>>> >>> >>> >>> >>> ________________________________ >>> >>> This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise confidential information. If you have >>> received it in error, please notify the sender immediately and delete the >>> original. Any other use of the e-mail by you is prohibited. Thank you in >>> advance for your cooperation. >>> >>> ________________________________ >> >> >> >> >> ________________________________ >> >> This message is for the designated recipient only and may contain >> privileged, proprietary, or otherwise confidential information. If you have >> received it in error, please notify the sender immediately and delete the >> original. Any other use of the e-mail by you is prohibited. Thank you in >> advance for your cooperation. >> >> ________________________________ >> > -- Gwen Shapira Product Manager | Confluent 650.450.2760 | @gwenshap Follow us: Twitter | blog