If you have 1000 partitions and 500 consumers, each consumer should be
consuming 2 partitions. You can verify this using ConsumerOffsetChecker.
Which version of Kafka are you using? If it's 0.8, you may want to take a
look at
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whydataisnotevenlydistributedamongpartitionswhenpartitioningkeyisnotspecified
?

Thanks,

Jun


On Sun, Dec 29, 2013 at 12:18 PM, Jay Beavers
<j...@hikinghomeschoolers.org>wrote:

> I've been trying to use Kafka to feed data into a computing cluster (e.g.
> 500 servers).  The basic design is one 'job submitter' server is a Producer
> into a Topic with 1000 partitions.  I then have 500 servers each running an
> instance of a multithreaded High Level Consumer all with a shared
> group.idthat asynchronously process incoming messages against a CPU
> intensive
> workload.  My expectation was that the Kafka would use server side logic to
> map the topic partitions into the different consumer instances in the
> shared group.  My goal is to be able to join and leave consumer instances
> over the lifetime of the processing and have Kafka automatically rebalance
> the partitions to the set of live Consumer instances.
>
> This hasn't been working well for me -- in practice I've seen one or two of
> my cluster servers pick up messages and the others sit idle.  I suspect
> that each High Level Consumer is picking up partition 0 and ZooKeeper is
> getting confused about which instance/socket to map the messages into.
>  After reading through the docs a few more times, I think the partition ->
> group mapping logic is client side rather than server side -- if this is
> the case I think my scenario is fundamentally broken unless I implement an
> independent service for partition -> client mapping.  I've looked through
> the Simple Consumer example and it looks like the partition mapping logic
> is handled client side there so it seems to lead me back down the path of
> writing my own partitioning service.
>
> Can you confirm my understanding that partition -> consumer mapping is
> client side logic?  Is there an established pattern I should be following
> to use Kafka in a 1 Producer -> Many Consumers Instances in a Shared Group
> scenario?
>
> Thanks in advance for your advice,
>
>  - jcb
>

Reply via email to