Hello All,

We are running Kafka in production with 3 brokers and Kafka version 2.1.1.
We have noticed that when a Kafka broker was stopped for more than 10
minutes and we are starting it again, after the start-up we are facing
degradation of around 90% for up to 4 minutes.
During this period(of around 4 minutes) we observe CPU usage reduction from
22% to 2% at all of the brokers. Also, the broker which has just been
started have network-out 7 MB/min and network-in 2.2 GB/min, on the other
hand, the rest of the brokers has network out 1.1 GB/min and network-in 55
MB/min.
We assume that this is due to the fact that the broker, who has been
stopped for more than 10 minutes, must catch up with the messages that have
been processed during the time while he was stopped.
The performance degradation persists until all 3 brokers become insync (we
have min.insync.replicas=2 and replication factor of 3).

It is worth mentioning that we have ~5k messages per/sec with an average
size ~3kb.

We also try to increase broker nodes to 5 (rebalanced) and run with
https://kafka.apache.org/081/documentation.html#prodconfig and still see
~35% performance degradation.

Thanks in advance.

Best regards,
Miroslav

Reply via email to