Hi all, We are building a system that will carry a high volume of traffic (on the order of 2 billion messages in each batch), which we need to process at a rate of 50,000 messages per second. We need to guarantee at-least-once delivery for each message. The system we are feeding has a latency of 50ms per message, and can absorb many concurrent requests.
We have a Kafka 0.8.1.1 cluster with three brokers and a Zookeeper 3.4.5 cluster with 5 nodes, each on physical hardware. We intend to deploy a consumer group of 2500 consumers against a single topic, with a partition for each consumer. We expect our consumers to be stable over the course of the run, so we expect rebalancing to be rare. In testing, we have successfully run 512 high-level consumers against 1024 partitions, but beyond 512 consumers the rebalance at startup doesn’t complete within 10 minutes. Is this a workable strategy with high-level consumers? Can we actually deploy a consumer group with this many consumers and partitions? We see throughput of more than 500,000 messages per second with our 512 consumers, but we need greater parallelism to meet our performance needs. -- Jack Foy <j...@whitepages.com>