Two out of three of our Kafka nodes have become unrecoverable due to disk corruption. I launched two new nodes, but they got new broker_id's. For redistributing the topics across the cluster, I ran the command: --- /opt/kafka/bin/kafka-reassign-partitions.sh --broker-list "1003,1005,1006,1007" --execute --zookeeper zookeeper.service.consul:2181/kafka --reassignment-json-file finalreassing.json ---
Most of the topics were reassigned correctly in half hour, however 11 of them were in progress for more than 72 hours and also: 1) they had more replicas than assigned 2) their replicas are assigned to the brokers that are dead. For an example, consider topic: www - The initial replica assignment before disaster: 1001,1002,1003 - We lose 1001, 1002 brokers permanently. There's no going back now - I run the kafka-reassign-paritions.sh --zookeeper zookeeper:2181/kafka --broker-list "1003,1005,1006" --execute --reassignment-json-file reassign.json - Contents of the json file are: - ,{"topic":"www","partition":0,"replicas":[1003,1005,1006]} - The process of reassigning starts, but never ends When I do kafka-topics.sh --describe ... Then the assignment is not updated. The reassignment never completed, and I had to kill it by deleting the ZK node /kafka/admin/reassign-partitions I also tried removing the replica assignment using reassignment script with the json: {"topic":"www","partition":0,"replicas":[1003]} But that doesn't seem to work either. As a final resort, I also updated the topic config on the zk node /kafka/broker/topics/www, The config is updated but kafka instantly reports that the replica are caught up. There was around 20gigs of data in the topic. Bottomline, I am not able to assign replicas for the topics that are assigned to dead brokers. What could be a workaround to do this without losing data?