It might appear that the data is not balanced, but it could be as a result of the imbalanced leaders setting.
Does this mean that we should set "auto.leader.rebalance.enable" to true? Any other configuration we need to change as well? As I mentioned before, we use pretty much use the default setting. All of our topics have replication factor of 2 (aka 2 copies per message). We don't have the topic output when we had the problem, but here's our topic output after we ran the kafka-preferred-replica-election.sh tool as suggested: $KAFKA_HOME/bin/kafka-topics.sh --zookeeper zkHost1:2181,zkHost2:2181,zkHost3:2181 --describe --topic=myKafkaTopic Topic:myKafkaTopic PartitionCount:24 ReplicationFactor:2 Configs: retention.ms=43200000 Topic: myKafkTopic Partition: 0 Leader: 2 Replicas: 2,1 Isr: 1,2 Topic: myKafkTopic Partition: 1 Leader: 3 Replicas: 3,2 Isr: 3,2 Topic: myKafkTopic Partition: 2 Leader: 4 Replicas: 4,3 Isr: 3,4 Topic: myKafkTopic Partition: 3 Leader: 5 Replicas: 5,4 Isr: 5,4 Topic: myKafkTopic Partition: 4 Leader: 6 Replicas: 6,5 Isr: 5,6 Topic: myKafkTopic Partition: 5 Leader: 7 Replicas: 7,6 Isr: 6,7 Topic: myKafkTopic Partition: 6 Leader: 8 Replicas: 8,7 Isr: 7,8 Topic: myKafkTopic Partition: 7 Leader: 9 Replicas: 9,8 Isr: 9,8 Topic: myKafkTopic Partition: 8 Leader: 10 Replicas: 10,9 Isr: 10,9 Topic: myKafkTopic Partition: 9 Leader: 11 Replicas: 11,10 Isr: 11,10 Topic: myKafkTopic Partition: 10 Leader: 12 Replicas: 12,11 Isr: 11,12 Topic: myKafkTopic Partition: 11 Leader: 13 Replicas: 13,12 Isr: 12,13 Topic: myKafkTopic Partition: 12 Leader: 14 Replicas: 14,13 Isr: 14,13 Topic: myKafkTopic Partition: 13 Leader: 15 Replicas: 15,14 Isr: 14,15 Topic: myKafkTopic Partition: 14 Leader: 16 Replicas: 16,15 Isr: 16,15 Topic: myKafkTopic Partition: 15 Leader: 17 Replicas: 17,16 Isr: 16,17 Topic: myKafkTopic Partition: 16 Leader: 18 Replicas: 18,17 Isr: 18,17 Topic: myKafkTopic Partition: 17 Leader: 19 Replicas: 19,18 Isr: 18,19 Topic: myKafkTopic Partition: 18 Leader: 20 Replicas: 20,19 Isr: 20,19 Topic: myKafkTopic Partition: 19 Leader: 21 Replicas: 21,20 Isr: 20,21 Topic: myKafkTopic Partition: 20 Leader: 22 Replicas: 22,21 Isr: 22,21 Topic: myKafkTopic Partition: 21 Leader: 23 Replicas: 23,22 Isr: 23,22 Topic: myKafkTopic Partition: 22 Leader: 24 Replicas: 24,23 Isr: 23,24 Topic: myKafkTopic Partition: 23 Leader: 1 Replicas: 1,24 Isr: 1,24 Thanks, Connie On Thu, Jul 17, 2014 at 4:20 PM, Neha Narkhede <neha.narkh...@gmail.com> wrote: > Connie, > > After we freed up the > cluster disk space and adjusted the broker data retention policy, we > noticed that the cluster partition was not balanced based on topic describe > script came from Kafka 0.8.1.1 distribution. > > When you say the cluster was not balanced, did you mean the leaders or the > data? The describe topic tool does not give information about data sizes, > so I'm assuming you are referring to leader imbalance. If so, the right > tool to run is kafka-preferred-replica-election.sh not partition > reassignment. In general, assuming the partitions were evenly distributed > on your cluster before you ran out of disk space, the only thing you should > need to do to recover is delete a few older segments and bounce each > broker, one at a time. It is also preferrable to run preferred replica > election after a complete cluster bounce so the leaders are well > distributed. > > Also, it will help if you can send around the output of the describe topic > tool. I wonder if your topics have a replication factor of 1 inadvertently? > > Thanks, > Neha > > > On Thu, Jul 17, 2014 at 11:57 AM, Connie Yang <cybercon...@gmail.com> > wrote: > > > Hi All, > > > > Our Kafka cluster ran out of disk space yesterday. After we freed up the > > cluster disk space and adjusted the broker data retention policy, we > > noticed that the cluster partition was not balanced based on topic > describe > > script came from Kafka 0.8.1.1 distribution. So, we tried to rebalance > the > > partition using the kafka-reassign-partitions.sh. After sometime later, > we > > ran out of disk space on 2 brokers in the cluster while the rest have > > plenty of disk space left. > > > > This seems to suggest that only two brokers were receiving messages. We > > have not changed the broker partition from our producer which uses a > random > > partition key strategy. > > > > String uuid = UUID.randomUUID().toString(); > > KeyedMessage<String, String> data = new KeyedMessage<String, String>( > > "myKafkaTopic" > > uuid, msgBuilder.toString()); > > > > > > Questions > > 1. Is partition reassignment required after disk full or when some of the > > brokers are not healthy? > > 2. Is there a broker config that we can use to auto rebalance the broker > > partition? Should "auto.leader.rebalance.enable" set to true? > > 2. How do we recover from situation like this? > > > > We pretty much use default configuration on the broker. > > > > Thanks, > > Connie > > >