Hi, My Kafka version is 0.8.2.2 Replica factor is 2. auto.leader.rebalance.enable=true
I stopped a broker in my cluster. After a few minutes I started this broker. The broker was busy catching up huge lag and reached 120MB/s disk write limit. Additionally there are 23 partitions whose only undead replica lives on this broker. So shortly after this broker was up, it became the leader of the 23 partitions. After about 10 minutes this broker was up, all brokers' network and message in started to fall. Messages in dropped to close to zero after another 5 minutes. It was only when I stopped this broker that producers recovered to send messages to cluster again. 1. When a broker reaches 120MB/s disk write limit, how will it affect other brokers and producers? 2. What caused other brokers' or producers' problem? 3. If broker 3 became leaders of NO partitions after I restarted it, would it still cause problem even if it reached disk write limit when it was busy catching up as follows? 4. If question 3's answer is 'yes', what should I do to safely restart a broker with huge traffic? 5. Does auto.leader.rebalance.enable=true do anything harmful? Thanks