@Suman, thanks for confirming. I will dig more then. The instances are dedicated to running Kafka, and so is the mounted volume.
@Seva, thanks for the insight. I guess if nothing works, then we will move from st1 to gp2 volumes. On Tue, Apr 7, 2020 at 12:28 AM Suman B N <sumannew...@gmail.com> wrote: > We have used st1 volumes and we never saw any issue. > Yes, we are using m-series. Even t-series worked for us :D > > During those spikes, do you observe any background operations going on? > Check server logs, controller logs. > > On Tue, Apr 7, 2020 at 12:49 PM Seva Feldman <sev...@ironsrc.com> wrote: > > > ST1 EBS fit only for sequential rights and reads. Once you have many > > partitions on EBS it will be mostly random. > > Interesting to monitor random vs sequential... > > > > We tested kafka on ST1 with 1xx partitions on each EBS and it was > > constantly lagging. > > > > BR > > > > On Tue, Apr 7, 2020 at 10:06 AM Soumyajit Sahu <soumyajit.s...@gmail.com > > > > wrote: > > > > > Our typical IOPS stays at ~10K write ops/min, but it goes to 37K write > > > ops/min (which is where AWS throttles). > > > The spike in write ops isn't accompanied by any spike in write > throughput > > > or produce requests (except for the first few minutes of catch up). The > > > write ops spike stays up (persistently for an hour or two) until we > stop > > > the broker ec2 instance for about 30 mins and then start it back. > > > > > > @Liam, no, we are not using log compaction except for a few consumer > > offset > > > topics and config topic (for Kafka Connect), and schema registry store. > > > > > > @Suman, are you using m5 or r5 instances. Recently, we migrated from r5 > > to > > > m5, and I wonder if that has a hand in this. > > > > > > We have about 1000 partitions residing on each disk, but I don't think > > that > > > matters as most of the time the brokers run flawlessly (even during > peak > > > traffic hours). > > > > > > Thanks! > > > > > > On Mon, Apr 6, 2020 at 11:39 PM Suman B N <sumannew...@gmail.com> > wrote: > > > > > > > We too have a similar setup but we never observed any such spikes. > > > > > > > > Are you sure your disk IOPS is good enough? Check if that is > > throttling. > > > > > > > > After a broker restarts, there might be more traffic as well because > of > > > > followers trying to catch up with the leader. > > > > > > > > -Suman > > > > > > > > On Tue, Apr 7, 2020 at 11:59 AM Soumyajit Sahu < > > soumyajit.s...@gmail.com > > > > > > > > wrote: > > > > > > > > > We are running Kafka on AWS EC2 instances (m5.2xlarge) with mounted > > EBS > > > > st1 > > > > > volume (one on each machine). > > > > > Occasionally, we have noticed that the write ops/second goes > through > > > the > > > > > roof and we get throttled by AWS while the data throughput wouldn't > > > have > > > > > changed much. As far as our observation goes, it happens usually > > after > > > a > > > > > broker restart. > > > > > > > > > > Has anyone else come across this behavior? > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > -- > > > > *Suman* > > > > *OlaCabs* > > > > > > > > > > > > > -- > > Seva Feldman > > VP R&D Mobile Delivery > > [image: ironSource] <http://www.ironsrc.com/> > > > > email sev...@ironsrc.com > > mobile +972544346089 > > > > ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv > > > > > -- > *Suman* > *OlaCabs* >