If the cluster is busy then it will have lots of data to rebalance once the broker comes online. What type is your underlying storage ? Are you using SSD ?
5k/sec and avg size 3kb i.e. 15000Kb (14.6 MB /sec ) . So if your broker is down for 10 minutes then approx 8 GB data need to rebalance and again it will depend on replication factor. On Wed, Sep 16, 2020 at 3:09 PM Miroslav Tsvetanov <tsvetanov...@gmail.com> wrote: > Hello All, > > We are running Kafka in production with 3 brokers and Kafka version 2.1.1. > We have noticed that when a Kafka broker was stopped for more than 10 > minutes and we are starting it again, after the start-up we are facing > degradation of around 90% for up to 4 minutes. > During this period(of around 4 minutes) we observe CPU usage reduction from > 22% to 2% at all of the brokers. Also, the broker which has just been > started have network-out 7 MB/min and network-in 2.2 GB/min, on the other > hand, the rest of the brokers has network out 1.1 GB/min and network-in 55 > MB/min. > We assume that this is due to the fact that the broker, who has been > stopped for more than 10 minutes, must catch up with the messages that have > been processed during the time while he was stopped. > The performance degradation persists until all 3 brokers become insync (we > have min.insync.replicas=2 and replication factor of 3). > > It is worth mentioning that we have ~5k messages per/sec with an average > size ~3kb. > > We also try to increase broker nodes to 5 (rebalanced) and run with > https://kafka.apache.org/081/documentation.html#prodconfig and still see > ~35% performance degradation. > > Thanks in advance. > > Best regards, > Miroslav > -- Thanx & Regard Ashutosh Singh 08151945559