Two out of three of our Kafka nodes have become unrecoverable due to disk
corruption. I launched two new nodes, but they got new broker_id's.
For redistributing the topics across the cluster, I ran the command:
---
/opt/kafka/bin/kafka-reassign-partitions.sh --broker-list
"1003,1005,1006,1007" --execute --zookeeper
zookeeper.service.consul:2181/kafka --reassignment-json-file
finalreassing.json
---
Most of the topics were reassigned correctly in half hour, however 11 of
them were in progress for more than 72 hours and also:
1) they had more replicas than assigned
2) their replicas are assigned to the brokers that are dead.
For an example, consider topic: www
- The initial replica assignment before disaster: 1001,1002,1003
- We lose 1001, 1002 brokers permanently. There's no going back now
- I run the kafka-reassign-paritions.sh --zookeeper zookeeper:2181/kafka
--broker-list "1003,1005,1006" --execute --reassignment-json-file
reassign.json
- Contents of the json file are:
- ,{"topic":"www","partition":0,"replicas":[1003,1005,1006]}
- The process of reassigning starts, but never ends
When I do kafka-topics.sh --describe ... Then the assignment is not
updated.
The reassignment never completed, and I had to kill it by deleting the ZK
node /kafka/admin/reassign-partitions
I also tried removing the replica assignment using reassignment script with
the json:
{"topic":"www","partition":0,"replicas":[1003]}
But that doesn't seem to work either.
As a final resort, I also updated the topic config on the zk node
/kafka/broker/topics/www, The config is updated but kafka instantly reports
that the replica are caught up. There was around 20gigs of data in the
topic.
Bottomline, I am not able to assign replicas for the topics that are
assigned to dead brokers. What could be a workaround to do this without
losing data?