Re: How to recover from a disk full situation in Kafka cluster?

Connie Yang Thu, 17 Jul 2014 17:44:31 -0700

It might appear that the data is not balanced, but it could be as a result
of the imbalanced leaders setting.


Does this mean that we should set "auto.leader.rebalance.enable" to true?
 Any other configuration we need to change as well?  As I mentioned before,
we use pretty much use the default setting.

All of our topics have replication factor of 2 (aka 2 copies per message).

We don't have the topic output when we had the problem, but here's our
topic output after we ran the kafka-preferred-replica-election.sh tool as
suggested:

$KAFKA_HOME/bin/kafka-topics.sh   --zookeeper
zkHost1:2181,zkHost2:2181,zkHost3:2181 --describe --topic=myKafkaTopic
Topic:myKafkaTopic PartitionCount:24 ReplicationFactor:2 Configs:
retention.ms=43200000
Topic: myKafkTopic Partition: 0 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: myKafkTopic Partition: 1 Leader: 3 Replicas: 3,2 Isr: 3,2
Topic: myKafkTopic Partition: 2 Leader: 4 Replicas: 4,3 Isr: 3,4
Topic: myKafkTopic Partition: 3 Leader: 5 Replicas: 5,4 Isr: 5,4
Topic: myKafkTopic Partition: 4 Leader: 6 Replicas: 6,5 Isr: 5,6
Topic: myKafkTopic Partition: 5 Leader: 7 Replicas: 7,6 Isr: 6,7
Topic: myKafkTopic Partition: 6 Leader: 8 Replicas: 8,7 Isr: 7,8
Topic: myKafkTopic Partition: 7 Leader: 9 Replicas: 9,8 Isr: 9,8
Topic: myKafkTopic Partition: 8 Leader: 10 Replicas: 10,9 Isr: 10,9
Topic: myKafkTopic Partition: 9 Leader: 11 Replicas: 11,10 Isr: 11,10
Topic: myKafkTopic Partition: 10 Leader: 12 Replicas: 12,11 Isr: 11,12
Topic: myKafkTopic Partition: 11 Leader: 13 Replicas: 13,12 Isr: 12,13
Topic: myKafkTopic Partition: 12 Leader: 14 Replicas: 14,13 Isr: 14,13
Topic: myKafkTopic Partition: 13 Leader: 15 Replicas: 15,14 Isr: 14,15
Topic: myKafkTopic Partition: 14 Leader: 16 Replicas: 16,15 Isr: 16,15
Topic: myKafkTopic Partition: 15 Leader: 17 Replicas: 17,16 Isr: 16,17
Topic: myKafkTopic Partition: 16 Leader: 18 Replicas: 18,17 Isr: 18,17
Topic: myKafkTopic Partition: 17 Leader: 19 Replicas: 19,18 Isr: 18,19
Topic: myKafkTopic Partition: 18 Leader: 20 Replicas: 20,19 Isr: 20,19
Topic: myKafkTopic Partition: 19 Leader: 21 Replicas: 21,20 Isr: 20,21
Topic: myKafkTopic Partition: 20 Leader: 22 Replicas: 22,21 Isr: 22,21
Topic: myKafkTopic Partition: 21 Leader: 23 Replicas: 23,22 Isr: 23,22
Topic: myKafkTopic Partition: 22 Leader: 24 Replicas: 24,23 Isr: 23,24
Topic: myKafkTopic Partition: 23 Leader: 1 Replicas: 1,24 Isr: 1,24

Thanks,
Connie



On Thu, Jul 17, 2014 at 4:20 PM, Neha Narkhede <neha.narkh...@gmail.com>
wrote:

> Connie,
>
> After we freed up the
> cluster disk space and adjusted the broker data retention policy, we
> noticed that the cluster partition was not balanced based on topic describe
> script came from Kafka 0.8.1.1 distribution.
>
> When you say the cluster was not balanced, did you mean the leaders or the
> data? The describe topic tool does not give information about data sizes,
> so I'm assuming you are referring to leader imbalance. If so, the right
> tool to run is kafka-preferred-replica-election.sh not partition
> reassignment. In general, assuming the partitions were evenly distributed
> on your cluster before you ran out of disk space, the only thing you should
> need to do to recover is delete a few older segments and bounce each
> broker, one at a time. It is also preferrable to run preferred replica
> election after a complete cluster bounce so the leaders are well
> distributed.
>
> Also, it will help if you can send around the output of the describe topic
> tool. I wonder if your topics have a replication factor of 1 inadvertently?
>
> Thanks,
> Neha
>
>
> On Thu, Jul 17, 2014 at 11:57 AM, Connie Yang <cybercon...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Our Kafka cluster ran out of disk space yesterday.  After we freed up the
> > cluster disk space and adjusted the broker data retention policy, we
> > noticed that the cluster partition was not balanced based on topic
> describe
> > script came from Kafka 0.8.1.1 distribution.  So, we tried to rebalance
> the
> > partition using the kafka-reassign-partitions.sh. After sometime later,
> we
> > ran out of disk space on 2 brokers in the cluster while the rest have
> > plenty of disk space left.
> >
> > This seems to suggest that only two brokers were receiving messages.  We
> > have not changed the broker partition from our producer which uses a
> random
> > partition key strategy.
> >
> > String uuid = UUID.randomUUID().toString();
> > KeyedMessage<String, String> data = new KeyedMessage<String, String>(
> > "myKafkaTopic"
> > uuid, msgBuilder.toString());
> >
> >
> > Questions
> > 1. Is partition reassignment required after disk full or when some of the
> > brokers are not healthy?
> > 2. Is there a broker config that we can use to auto rebalance the broker
> > partition?  Should  "auto.leader.rebalance.enable" set to true?
> > 2. How do we recover from situation like this?
> >
> > We pretty much use default configuration on the broker.
> >
> > Thanks,
> > Connie
> >
>

Re: How to recover from a disk full situation in Kafka cluster?

Reply via email to