Sorry for answering late.

The mapping from partitions to threads also depend on the structure of
your topology. As you mention that you have a quite complex one, I
assume that this is the reason for the uneven distribution. I you want
to dig deeper, it would be helpful to know the structure of your topology.


-Matthias

On 1/9/17 12:05 PM, Ara Ebrahimi wrote:
> I meant I have 7 topics and each has 12 partitions. Considering that I have 4 
> streaming threads per node, I was expecting to see each thread process 1 
> partition from each topics and 7 partitions total per streaming thread. But 
> that’s not the case. Or perhaps you are saying the number of streaming 
> threads should follow the total number of partitions across all 7 topics?!
> 
> Ara.
> 
>> On Jan 9, 2017, at 11:48 AM, Michael Noll <mich...@confluent.io> wrote:
>>
>> What does the processing topology of your Kafka Streams application look
>> like, and what's the exact topic and partition configuration?  You say you
>> have 12 partitions in your cluster, presumably across 7 topics -- that
>> means that most topics have just a single partition.  Depending on your
>> topology (e.g. if you have defined that single-partition topics A, B, C
>> must be joined), Kafka Streams is forced to let one of your three Streams
>> nodes process "more" topics/partitions than the other two nodes.
>>
>> -Michael
>>
>>
>>
>> On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3
>>> kafka streams nodes. Each is configured to have 4 streaming threads. My
>>> topology is quite complex and I have 7 topics and lots of joins and states.
>>>
>>> What I have noticed is that each of the 3 kafka streams nodes gets
>>> configured to process variables number of partitions of a topic. One node
>>> is assigned to process 2 partitions of topic a and another one gets
>>> assigned 5. Hence I end up with nonuniform throughput across these nodes.
>>> One node ends up processing more data than the other.
>>>
>>> What’s going on? How can I make sure partitions assignment to kafka
>>> streams nodes is uniform?
>>>
>>> On a similar topic, is there a way to make sure partition assignment to
>>> disks across kafka brokers is also uniform? Even if I use a round-robin one
>>> to pin partitions to broker, but there doesn’t seem to be a way to
>>> uniformly pin partitions to disks. Or maybe I’m missing something here? I
>>> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a
>>> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly
>>> distributed either.
>>>
>>> Ara.
>>>
>>>
>>>
>>> ________________________________
>>>
>>> This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise confidential information. If you have
>>> received it in error, please notify the sender immediately and delete the
>>> original. Any other use of the e-mail by you is prohibited. Thank you in
>>> advance for your cooperation.
>>>
>>> ________________________________
>>>
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain 
>> privileged, proprietary, or otherwise confidential information. If you have 
>> received it in error, please notify the sender immediately and delete the 
>> original. Any other use of the e-mail by you is prohibited. Thank you in 
>> advance for your cooperation.
>>
>> ________________________________
> 
> 
> 
> 
> ________________________________
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Thank you in advance for your 
> cooperation.
> 
> ________________________________
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to