Hello, I've run into a weird situation with Kafka 0.8.1.1. I had an operating cluster which I wanted to extend with new brokers. The sequence was as follows:
1. I added the brokers to cluster and ensured that they appeared under /brokers/ids. 2. Ran reassign-partitions tool to redistribute the data and load evenly between all the brokers. There was 2 topics with 200 partitions each, both having replication factor 3. 3. Data transfer between replicas was too slow, so I decided to increase num.replica.fetchers from 1 to 4 to speed up the process. I adjusted brokers configuration and began rolling restart on broker at a time. Over the course of restarts I noticed lots of errors in the logs, such as "topic is in the process of being deleted" (which obviously didn't hold true) and "incorrect LeaderAndIsr received". Had no idea what to do about them, so repeated restart for some brokers. 4. Waited for a while so that replicas caught up 5. Ran preferred-replica-election and finished the process. Observations. When I ran kafka-topics.sh --list during the reassignment, I saw more than 3 replicas for some partitions in the "Replicas" field. I assumed this is expected, since a partition might be assigned to a completely different set of replicas which did not overlap with the original replicas. Bad thing is that this situation have not changed till now. I still see 4-6 replicas in "Replicas" and "ISR" for many partitions, even when kafka-topics.sh --under-replicated does not show anything. What is worse, the kafka-topics.sh --describe shows "Replication factor" changed to 5 for the one topic, and 6 for the other! I wonder how it might happen that replication factor was increased by Kafka in this way. Any idea on how I can get my topics back to replication factor 3 is appreciated.