We are using EC2 EBS volume "thoroughput optimized hdd (st1)" from AWS: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html with 3 brokers and replication factor 3. There is no data lost we simply accept 10% of the messages sent during this time period and the rest are delayed and some of them can reach timeout.
On Wed, Sep 16, 2020 at 1:30 PM Ashutosh singh <getas...@gmail.com> wrote: > If the cluster is busy then it will have lots of data to rebalance once the > broker comes online. What type is your underlying storage ? Are you using > SSD ? > > 5k/sec and avg size 3kb i.e. 15000Kb (14.6 MB /sec ) . So if your broker > is down for 10 minutes then approx 8 GB data need to rebalance and again it > will depend on replication factor. > > > > > On Wed, Sep 16, 2020 at 3:09 PM Miroslav Tsvetanov <tsvetanov...@gmail.com > > > wrote: > > > Hello All, > > > > We are running Kafka in production with 3 brokers and Kafka version > 2.1.1. > > We have noticed that when a Kafka broker was stopped for more than 10 > > minutes and we are starting it again, after the start-up we are facing > > degradation of around 90% for up to 4 minutes. > > During this period(of around 4 minutes) we observe CPU usage reduction > from > > 22% to 2% at all of the brokers. Also, the broker which has just been > > started have network-out 7 MB/min and network-in 2.2 GB/min, on the other > > hand, the rest of the brokers has network out 1.1 GB/min and network-in > 55 > > MB/min. > > We assume that this is due to the fact that the broker, who has been > > stopped for more than 10 minutes, must catch up with the messages that > have > > been processed during the time while he was stopped. > > The performance degradation persists until all 3 brokers become insync > (we > > have min.insync.replicas=2 and replication factor of 3). > > > > It is worth mentioning that we have ~5k messages per/sec with an average > > size ~3kb. > > > > We also try to increase broker nodes to 5 (rebalanced) and run with > > https://kafka.apache.org/081/documentation.html#prodconfig and still see > > ~35% performance degradation. > > > > Thanks in advance. > > > > Best regards, > > Miroslav > > > > > -- > Thanx & Regard > Ashutosh Singh > 08151945559 >