If you have 1000 partitions and 500 consumers, each consumer should be consuming 2 partitions. You can verify this using ConsumerOffsetChecker. Which version of Kafka are you using? If it's 0.8, you may want to take a look at https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whydataisnotevenlydistributedamongpartitionswhenpartitioningkeyisnotspecified ?
Thanks, Jun On Sun, Dec 29, 2013 at 12:18 PM, Jay Beavers <j...@hikinghomeschoolers.org>wrote: > I've been trying to use Kafka to feed data into a computing cluster (e.g. > 500 servers). The basic design is one 'job submitter' server is a Producer > into a Topic with 1000 partitions. I then have 500 servers each running an > instance of a multithreaded High Level Consumer all with a shared > group.idthat asynchronously process incoming messages against a CPU > intensive > workload. My expectation was that the Kafka would use server side logic to > map the topic partitions into the different consumer instances in the > shared group. My goal is to be able to join and leave consumer instances > over the lifetime of the processing and have Kafka automatically rebalance > the partitions to the set of live Consumer instances. > > This hasn't been working well for me -- in practice I've seen one or two of > my cluster servers pick up messages and the others sit idle. I suspect > that each High Level Consumer is picking up partition 0 and ZooKeeper is > getting confused about which instance/socket to map the messages into. > After reading through the docs a few more times, I think the partition -> > group mapping logic is client side rather than server side -- if this is > the case I think my scenario is fundamentally broken unless I implement an > independent service for partition -> client mapping. I've looked through > the Simple Consumer example and it looks like the partition mapping logic > is handled client side there so it seems to lead me back down the path of > writing my own partitioning service. > > Can you confirm my understanding that partition -> consumer mapping is > client side logic? Is there an established pattern I should be following > to use Kafka in a 1 Producer -> Many Consumers Instances in a Shared Group > scenario? > > Thanks in advance for your advice, > > - jcb >