Hi Stefan, Have you looked at the following output for message distribution across the topic-partitions and which topic-partition is consumed by which consumer thread?
kafaka-server/bin>./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group <consumer_group_name> Jagbir On Wed, Jul 15, 2015 at 12:50 PM, Stefan Miklosovic <mikloso...@gmail.com> wrote: > I have following problem, I tried almost everything I could but without any > luck > > All I want to do is to have 1 producer, 1 topic, 10 partitions and 10 > consumers. > > All I want is to send 1M of messages via producer to these 10 consumers. > > I am using built Kafka 0.8.3 from current upstream so I have bleeding > edge stuff. It does not work on 0.8.1.1 nor 0.8.2 stream. > > The problem I have is that I expect that when I send 1 milion of > messages via that producer, I will have all consumers busy. In other > words, if a message to be sent via producer is sent to partition > randomly (roundrobin / range), I expect that all 10 consumers will > process about 100k of messages each because producer sends it to > random partition of these 10. > > But I have never achieved such outcome. > > I was trying these combinations: > > 1) old scala producer vs old scala consumer > > Consumer was created by Consumers.createJavaConsumer() ten times. > Every consumer is running in the separate thread. > > 2) old scala producer vs new java consumer > > new consumer was used like I have 10 consumers listening for a topic > and 10 consumers subscribed to 1 partition. (consumer 1 - partition 1, > consumer 2 - paritition 2 and so on) > > 3) old scala producer with custom partitioner > > I even tried to use my own partitioner, I just generated a random > number from 0 to 9 so I expected that the messages will be sent > randomly to the partition of that number. > > All I see is that there are only couple of consumers from these 10 > utilized, even I am sending 1M of messages, all I got from the > debugging output is some preselected set of consumers which appear to > be selected randomly. > > Do you have ANY hint why all consumers are not utilized even > partitions are selected randomly? > > My initial suspicion was that rebalancing was done badly. The think > was I was generating old consumers in a loop quicky one after another > and I can imaging that rebalancing algorithm got mad. > > So I abandon this solution and I was thinking that let's just > subscribe these consumers one by one to some partition so I will have > 1 consumer subscribed just to 1 partition and there will not be any > rebalancing at all. > > Oh my how wrong was I ... nothing changed. > > So I was thinking that if I have 10 consumers, each one subscribed to > 1 paritition, maybe producer is just sending messages to some set of > partitions and that's it. I was not sure how this can be possible so > to be super sure about the even spreading of message to partitions, I > used custom partitioner class in old consumer so I will be sure that > the partition the message will be sent to is super random. > > But that does not seems to work either. > > Please people, help me. > > -- > Stefan Miklosovic