Unfortunately the performance of the consumer balancing scales poorly with
the number of partitions. This is one of the things the consumer rewrite
project is meant to address, however that is not complete yet. A reasonable
workaround may be to decouple your application parallelism from the number
of partitions. I.e. have the processing of each partition happen in a
threadpool. I'm assuming that you don't actually have 2,500 machines, just
that you need that much parallelism since each messages takes a bit of time
to process. This does weaken the delivery ordering, but you may be able to
shard the processing by key to solve that problem.

-Jay

On Thu, Nov 6, 2014 at 10:59 AM, Jack Foy <j...@whitepages.com> wrote:

> Hi all,
>
> We are building a system that will carry a high volume of traffic (on the
> order of 2 billion messages in each batch), which we need to process at a
> rate of 50,000 messages per second. We need to guarantee at-least-once
> delivery for each message. The system we are feeding has a latency of 50ms
> per message, and can absorb many concurrent requests.
>
> We have a Kafka 0.8.1.1 cluster with three brokers and a Zookeeper 3.4.5
> cluster with 5 nodes, each on physical hardware.
>
> We intend to deploy a consumer group of 2500 consumers against a single
> topic, with a partition for each consumer. We expect our consumers to be
> stable over the course of the run, so we expect rebalancing to be rare. In
> testing, we have successfully run 512 high-level consumers against 1024
> partitions, but beyond 512 consumers the rebalance at startup doesn’t
> complete within 10 minutes. Is this a workable strategy with high-level
> consumers? Can we actually deploy a consumer group with this many consumers
> and partitions?
>
> We see throughput of more than 500,000 messages per second with our 512
> consumers, but we need greater parallelism to meet our performance needs.
>
> --
> Jack Foy <j...@whitepages.com>
>
>
>
>

Reply via email to