Re: kafka streams consumer partition assignment is uneven

Gwen Shapira Tue, 17 Jan 2017 16:00:01 -0800

btw. in case you didn't find out yet (I just discovered this...), you
can get the entire topology by starting the stream, waiting a bit and
then printing "KafkaStreams.toString()" to console.


I found it useful and cool :)


On Tue, Jan 17, 2017 at 3:19 PM, Matthias J. Sax <matth...@confluent.io> wrote:
> Sorry for answering late.
>
> The mapping from partitions to threads also depend on the structure of
> your topology. As you mention that you have a quite complex one, I
> assume that this is the reason for the uneven distribution. I you want
> to dig deeper, it would be helpful to know the structure of your topology.
>
>
> -Matthias
>
> On 1/9/17 12:05 PM, Ara Ebrahimi wrote:
>> I meant I have 7 topics and each has 12 partitions. Considering that I have 
>> 4 streaming threads per node, I was expecting to see each thread process 1 
>> partition from each topics and 7 partitions total per streaming thread. But 
>> that’s not the case. Or perhaps you are saying the number of streaming 
>> threads should follow the total number of partitions across all 7 topics?!
>>
>> Ara.
>>
>>> On Jan 9, 2017, at 11:48 AM, Michael Noll <mich...@confluent.io> wrote:
>>>
>>> What does the processing topology of your Kafka Streams application look
>>> like, and what's the exact topic and partition configuration?  You say you
>>> have 12 partitions in your cluster, presumably across 7 topics -- that
>>> means that most topics have just a single partition.  Depending on your
>>> topology (e.g. if you have defined that single-partition topics A, B, C
>>> must be joined), Kafka Streams is forced to let one of your three Streams
>>> nodes process "more" topics/partitions than the other two nodes.
>>>
>>> -Michael
>>>
>>>
>>>
>>> On Mon, Jan 9, 2017 at 6:52 PM, Ara Ebrahimi <ara.ebrah...@argyledata.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 3 kafka brokers, each with 4 disks. I have 12 partitions. I have 3
>>>> kafka streams nodes. Each is configured to have 4 streaming threads. My
>>>> topology is quite complex and I have 7 topics and lots of joins and states.
>>>>
>>>> What I have noticed is that each of the 3 kafka streams nodes gets
>>>> configured to process variables number of partitions of a topic. One node
>>>> is assigned to process 2 partitions of topic a and another one gets
>>>> assigned 5. Hence I end up with nonuniform throughput across these nodes.
>>>> One node ends up processing more data than the other.
>>>>
>>>> What’s going on? How can I make sure partitions assignment to kafka
>>>> streams nodes is uniform?
>>>>
>>>> On a similar topic, is there a way to make sure partition assignment to
>>>> disks across kafka brokers is also uniform? Even if I use a round-robin one
>>>> to pin partitions to broker, but there doesn’t seem to be a way to
>>>> uniformly pin partitions to disks. Or maybe I’m missing something here? I
>>>> end up with 2 partitions of topic a on disk 1 and 3 partitions of topic a
>>>> on disk 2. It’s a bit variable. Not totally random, but it’s not uniformly
>>>> distributed either.
>>>>
>>>> Ara.
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise confidential information. If you have
>>>> received it in error, please notify the sender immediately and delete the
>>>> original. Any other use of the e-mail by you is prohibited. Thank you in
>>>> advance for your cooperation.
>>>>
>>>> ________________________________
>>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> This message is for the designated recipient only and may contain 
>>> privileged, proprietary, or otherwise confidential information. If you have 
>>> received it in error, please notify the sender immediately and delete the 
>>> original. Any other use of the e-mail by you is prohibited. Thank you in 
>>> advance for your cooperation.
>>>
>>> ________________________________
>>
>>
>>
>>
>> ________________________________
>>
>> This message is for the designated recipient only and may contain 
>> privileged, proprietary, or otherwise confidential information. If you have 
>> received it in error, please notify the sender immediately and delete the 
>> original. Any other use of the e-mail by you is prohibited. Thank you in 
>> advance for your cooperation.
>>
>> ________________________________
>>
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: kafka streams consumer partition assignment is uneven

Reply via email to