Re: Best Practice Scaling Consumers

natorenvliet Tue, 07 May 2019 02:36:51 -0700

Hi Morritz - I don’t believe the number of Kafka consumers is restricted to the 
number of partitions.


When you create a topic - and indicate both the number of partitions and a key 
- it causes your key value pairs to be allocated to a specific partition on the 
basis of a hash function on the key.  

I believe the purpose of partitioning is to speed up consumption of Kafka day 
from a specific key within a Kafka topic.  It essentially pre sorts your 
topic’s data into as many categories as you have partitions.

BTW as of yet I haven’t figured out how to consume data from one of the 
partitions while ignoring the others.




Sent from my iPhone

> On May 6, 2019, at 9:30 PM, Kamal Chandraprakash 
> <kamal.chandraprak...@gmail.com> wrote:
> 
> 1. Yes, you may have to overprovision the number of partitions to handle
> the load peaks. Refer this
> <https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster>
> document to choose the no. of partitions.
> 2. KIP-429
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol>
> is
> proposed to reduce the time taken by the consumer rebalance protocol when a
> consumer instance is added/removed from the group.
> 
> On Mon, May 6, 2019 at 7:47 PM Moritz Petersen <mpete...@adobe.com.invalid>
> wrote:
> 
>> Hi all,
>> 
>> I’m new to Kafka and have a very basic question:
>> 
>> We build a cloud-scale platform and evaluate if we can use Kafka for
>> pub-sub messaging between our services. Most of our services scale
>> dynamically based on load (number of requests, CPU load etc.). In our
>> current architecture, services are both, producers and consumers since all
>> services listen to some kind of events.
>> 
>> With Kafka, I assume we have two restrictions or issues:
>> 
>>  1.  Number of consumers is restricted to the number of partitions of a
>> topic. Changing the number of partitions is a relatively expensive
>> operation (at least compared to scaling services). Is it necessary to
>> overprovision on the number of partitions in order to be prepared for load
>> peaks?
>>  2.  Adding or removing consumers halts processing of the related
>> partition for a short period of time. Is it possible to avoid or
>> significantly minimize this lag?
>> 
>> Are there any additional best practices to implement Kafka consumers on a
>> cloud scale environment?
>> 
>> Thanks,
>> Moritz
>> 
>>

Re: Best Practice Scaling Consumers

Reply via email to