Hi Morritz - I don’t believe the number of Kafka consumers is restricted to the number of partitions.
When you create a topic - and indicate both the number of partitions and a key - it causes your key value pairs to be allocated to a specific partition on the basis of a hash function on the key. I believe the purpose of partitioning is to speed up consumption of Kafka day from a specific key within a Kafka topic. It essentially pre sorts your topic’s data into as many categories as you have partitions. BTW as of yet I haven’t figured out how to consume data from one of the partitions while ignoring the others. Sent from my iPhone > On May 6, 2019, at 9:30 PM, Kamal Chandraprakash > <kamal.chandraprak...@gmail.com> wrote: > > 1. Yes, you may have to overprovision the number of partitions to handle > the load peaks. Refer this > <https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster> > document to choose the no. of partitions. > 2. KIP-429 > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol> > is > proposed to reduce the time taken by the consumer rebalance protocol when a > consumer instance is added/removed from the group. > > On Mon, May 6, 2019 at 7:47 PM Moritz Petersen <mpete...@adobe.com.invalid> > wrote: > >> Hi all, >> >> I’m new to Kafka and have a very basic question: >> >> We build a cloud-scale platform and evaluate if we can use Kafka for >> pub-sub messaging between our services. Most of our services scale >> dynamically based on load (number of requests, CPU load etc.). In our >> current architecture, services are both, producers and consumers since all >> services listen to some kind of events. >> >> With Kafka, I assume we have two restrictions or issues: >> >> 1. Number of consumers is restricted to the number of partitions of a >> topic. Changing the number of partitions is a relatively expensive >> operation (at least compared to scaling services). Is it necessary to >> overprovision on the number of partitions in order to be prepared for load >> peaks? >> 2. Adding or removing consumers halts processing of the related >> partition for a short period of time. Is it possible to avoid or >> significantly minimize this lag? >> >> Are there any additional best practices to implement Kafka consumers on a >> cloud scale environment? >> >> Thanks, >> Moritz >> >>