consumer mapping is client side logic to the high level consumer, yes. partition -> client mapping is automatically handled by the high level consumer. this is organized by groupId
when groupId is the same the consumers operate together and consumers within the group do not see the same information. when groupId is different then they will see the same information and manage their offsets differently. the kafka high level consumer will rebalance consumers for partitions available. if every consumer has at most one partition and there are more consumers in the group still then those you should treat like consumer side fail over. so you can have twice the number of consumers you have partitions and lose 50% of your infrastructure and not lose any performance through the pipeline. /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop> ********************************************/ On Sun, Dec 29, 2013 at 3:18 PM, Jay Beavers <j...@hikinghomeschoolers.org>wrote: > I've been trying to use Kafka to feed data into a computing cluster (e.g. > 500 servers). The basic design is one 'job submitter' server is a Producer > into a Topic with 1000 partitions. I then have 500 servers each running an > instance of a multithreaded High Level Consumer all with a shared > group.idthat asynchronously process incoming messages against a CPU > intensive > workload. My expectation was that the Kafka would use server side logic to > map the topic partitions into the different consumer instances in the > shared group. My goal is to be able to join and leave consumer instances > over the lifetime of the processing and have Kafka automatically rebalance > the partitions to the set of live Consumer instances. > > This hasn't been working well for me -- in practice I've seen one or two of > my cluster servers pick up messages and the others sit idle. I suspect > that each High Level Consumer is picking up partition 0 and ZooKeeper is > getting confused about which instance/socket to map the messages into. > After reading through the docs a few more times, I think the partition -> > group mapping logic is client side rather than server side -- if this is > the case I think my scenario is fundamentally broken unless I implement an > independent service for partition -> client mapping. I've looked through > the Simple Consumer example and it looks like the partition mapping logic > is handled client side there so it seems to lead me back down the path of > writing my own partitioning service. > > Can you confirm my understanding that partition -> consumer mapping is > client side logic? Is there an established pattern I should be following > to use Kafka in a 1 Producer -> Many Consumers Instances in a Shared Group > scenario? > > Thanks in advance for your advice, > > - jcb >