The following is a bit weird. It indicates no leader for partition 4, which is inconsistent with what describe-topic shows.
2014-10-13 19:02:32,611 WARN [main] kafka.producer.BrokerPartitionInfo: Error while fetching metadata partition 4 leader: none replicas: 3 (tr-pan-hclstr-13.amers1b.ciscloud:9092),2 (tr-pan-hclstr-12.amers1b.ciscloud:9092),4 (tr-pan-hclstr-14.amers1b.ciscloud:9092) isr: isUnderReplicated: true for topic partition [wordcount,4]: [class kafka.common.LeaderNotAvailableException] Any error in the controller and the state-change log? Do you see broker 3 marked as dead in the controller log? Also, could you check if the broker registration in ZK ( https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper) has the correct host/port? Thanks, Jun On Mon, Oct 13, 2014 at 5:35 PM, Abraham Jacob <abe.jac...@gmail.com> wrote: > Hi All, > > I have a 8 node Kafka cluster (broker.id - 1..8). On this cluster I have a > topic "wordcount", which was 8 partitions with a replication factor of 3. > > So a describe of topic wordcount > # bin/kafka-topics.sh --describe --zookeeper > tr-pan-hclstr-08.amers1b.ciscloud:2181/kafka/kafka-clstr-01 --topic > wordcount > > > Topic:wordcount PartitionCount:8 ReplicationFactor:3 Configs: > Topic: wordcount Partition: 0 Leader: 6 Replicas: 7,6,8 > Isr: 6,7,8 > Topic: wordcount Partition: 1 Leader: 7 Replicas: 8,7,1 > Isr: 7 > Topic: wordcount Partition: 2 Leader: 8 Replicas: 1,8,2 > Isr: 8 > Topic: wordcount Partition: 3 Leader: 3 Replicas: 2,1,3 > Isr: 3 > Topic: wordcount Partition: 4 Leader: 3 Replicas: 3,2,4 > Isr: 3,2,4 > Topic: wordcount Partition: 5 Leader: 3 Replicas: 4,3,5 > Isr: 3,5 > Topic: wordcount Partition: 6 Leader: 6 Replicas: 5,4,6 > Isr: 6,5 > Topic: wordcount Partition: 7 Leader: 6 Replicas: 6,5,7 > Isr: 6,5,7 > > I wrote a simple producer to write to this topic. However when running I > get these messages in the logs - > > 2014-10-13 19:02:32,459 INFO [main] kafka.client.ClientUtils$: Fetching > metadata from broker id:0,host:tr-pan-hclstr-11.amers1b.ciscloud,port:9092 > with correlation id 0 for 1 topic(s) Set(wordcount) > 2014-10-13 19:02:32,464 INFO [main] kafka.producer.SyncProducer: Connected > to tr-pan-hclstr-11.amers1b.ciscloud:9092 for producing > 2014-10-13 19:02:32,551 INFO [main] kafka.producer.SyncProducer: > Disconnecting from tr-pan-hclstr-11.amers1b.ciscloud:9092 > 2014-10-13 19:02:32,611 WARN [main] kafka.producer.BrokerPartitionInfo: > Error while fetching metadata partition 4 leader: none replicas: 3 > (tr-pan-hclstr-13.amers1b.ciscloud:9092),2 > (tr-pan-hclstr-12.amers1b.ciscloud:9092),4 > (tr-pan-hclstr-14.amers1b.ciscloud:9092) isr: isUnderReplicated: > true for topic partition [wordcount,4]: [class > kafka.common.LeaderNotAvailableException] > 2014-10-13 19:02:33,505 INFO [main] kafka.producer.SyncProducer: Connected > to tr-pan-hclstr-15.amers1b.ciscloud:9092 for producing > 2014-10-13 19:02:33,543 WARN [main] > kafka.producer.async.DefaultEventHandler: Produce request with correlation > id 20611 failed due to [wordcount,5]: > kafka.common.NotLeaderForPartitionException,[wordcount,6]: > kafka.common.NotLeaderForPartitionException,[wordcount,7]: > kafka.common.NotLeaderForPartitionException > 2014-10-13 19:02:33,694 INFO [main] kafka.producer.SyncProducer: Connected > to tr-pan-hclstr-18.amers1b.ciscloud:9092 for producing > 2014-10-13 19:02:33,725 WARN [main] > kafka.producer.async.DefaultEventHandler: Produce request with correlation > id 20612 failed due to [wordcount,0]: > kafka.common.NotLeaderForPartitionException > 2014-10-13 19:02:33,861 INFO [main] kafka.producer.SyncProducer: Connected > to tr-pan-hclstr-11.amers1b.ciscloud:9092 for producing > 2014-10-13 19:02:33,983 WARN [main] > kafka.producer.async.DefaultEventHandler: Failed to send data since > partitions [wordcount,4] don't have a leader > > > Obviously something is terribly wrong... I am quite new to Kafka, hence > these messages don't make any sense to me, except for the fact that it is > telling me that some of the partitions don't have any leader. > > Could somebody be kind enough to explain the above message? > > A few more questions - > > (1) How does one get into this state? > (2) How can I get out of this state? > (3) I have set auto.leader.rebalance.enable=true on all brokers. Shouldn't > the partitions be balanced across all the brokers? > (4) I can see that the Kafka service are running on all 8 nodes. (I used > ps ax -o "pid pgid args" and I can see under the kafka Java process). > (5) Is there a way I can force a re-balance? > > > > Regards, > Jacob > > > > -- > ~ >