Hello We are starting to use Kafka in production but we found an unexpected (at least for me) behavior with the use of partitions. We have a bunch of topics with a few partitions each. We try to consume all data from several consumers (just one consumer group).
The problem is in the rebalance step. The rebalance splits the partitions per topic between all consumers. So if you have 100 topics but only 2 partitions each and 10 consumers only two consumers will be used. That is, for each topic all partitions will be listed and shared between the consumers in the consumer group in order (not randomly). This behavior is also described in algorithm 1 of the original kafka paper [1]. I don't understand this decision. Why is split by topic? Does it make sense to divide all partitions from all topics between all the consumers in the consumer group? I don't see the reason of this so I would like to hear your opinion before changing the code. We are using kafka 0.7.1. Thank you in advance Pablo [1] "Kafka: a Distributed Messaging System for Log Processing", Jay Kreps, Neha Narkhede and Jun Rao. http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf