Our typical IOPS stays at ~10K write ops/min, but it goes to 37K write ops/min (which is where AWS throttles). The spike in write ops isn't accompanied by any spike in write throughput or produce requests (except for the first few minutes of catch up). The write ops spike stays up (persistently for an hour or two) until we stop the broker ec2 instance for about 30 mins and then start it back.
@Liam, no, we are not using log compaction except for a few consumer offset topics and config topic (for Kafka Connect), and schema registry store. @Suman, are you using m5 or r5 instances. Recently, we migrated from r5 to m5, and I wonder if that has a hand in this. We have about 1000 partitions residing on each disk, but I don't think that matters as most of the time the brokers run flawlessly (even during peak traffic hours). Thanks! On Mon, Apr 6, 2020 at 11:39 PM Suman B N <sumannew...@gmail.com> wrote: > We too have a similar setup but we never observed any such spikes. > > Are you sure your disk IOPS is good enough? Check if that is throttling. > > After a broker restarts, there might be more traffic as well because of > followers trying to catch up with the leader. > > -Suman > > On Tue, Apr 7, 2020 at 11:59 AM Soumyajit Sahu <soumyajit.s...@gmail.com> > wrote: > > > We are running Kafka on AWS EC2 instances (m5.2xlarge) with mounted EBS > st1 > > volume (one on each machine). > > Occasionally, we have noticed that the write ops/second goes through the > > roof and we get throttled by AWS while the data throughput wouldn't have > > changed much. As far as our observation goes, it happens usually after a > > broker restart. > > > > Has anyone else come across this behavior? > > > > Thanks! > > > > > -- > *Suman* > *OlaCabs* >