Strategies for high-concurrency consumers

Jack Foy Thu, 06 Nov 2014 11:01:02 -0800

Hi all,

We are building a system that will carry a high volume of traffic (on the order 
of 2 billion messages in each batch), which we need to process at a rate of 
50,000 messages per second. We need to guarantee at-least-once delivery for 
each message. The system we are feeding has a latency of 50ms per message, and 
can absorb many concurrent requests.


We have a Kafka 0.8.1.1 cluster with three brokers and a Zookeeper 3.4.5 
cluster with 5 nodes, each on physical hardware. 

We intend to deploy a consumer group of 2500 consumers against a single topic, 
with a partition for each consumer. We expect our consumers to be stable over 
the course of the run, so we expect rebalancing to be rare. In testing, we have 
successfully run 512 high-level consumers against 1024 partitions, but beyond 
512 consumers the rebalance at startup doesn’t complete within 10 minutes. Is 
this a workable strategy with high-level consumers? Can we actually deploy a 
consumer group with this many consumers and partitions? 

We see throughput of more than 500,000 messages per second with our 512 
consumers, but we need greater parallelism to meet our performance needs. 

-- 
Jack Foy <j...@whitepages.com>

Strategies for high-concurrency consumers

Reply via email to