Hello,

If KafkaConsumer is subscribed to more than one topic or even for same
topic, if the consumer is assigned more than one partition, what is the
behavior of KafkaConsumer.poll()?

In our use case, we would like to use, for example, "user id" as a key for
records for topics. Naturally, for some users, we receive lot more records
than others, which would result in different partitions of the same topic
having different record rate. Some partitions will have substantially more
record rate than others. So the questions I had were:

* Will KafkaConsumer.poll() return same number of records for each
partition+topic combo? For example, if max records is set to 500, and if
consumer is assigned 5 partitions from 5 topics (1 partition per topic),
then will poll return 100 records for each partition+topic?

* What happens if partitions have different rate and size for incoming
records? I suspect if Kafka brokers return same number of records for each
partition assigned to the consumer instance, then some partitions with high
rate of incoming records may start falling behind? Or do brokers take the
lag of each partition into account when returning records for poll() API?

* In other case, what happens if partitions assigned to a consumer have
different brokers as leaders? How does poll() behave? For example, if
consumer has 3 partitions assigned - which are across 3 different brokers,
and if max records for poll is set to 300, will consumer ask only 100 max
records from each broker?

Thanks,
M

Reply via email to