We are using mainly ephemeral instances like i3en as our pattern is more fit for it.
On Tue, Apr 7, 2020 at 10:40 AM Soumyajit Sahu <soumyajit.s...@gmail.com> wrote: > @Suman, thanks for confirming. I will dig more then. The instances are > dedicated to running Kafka, and so is the mounted volume. > > @Seva, thanks for the insight. I guess if nothing works, then we will move > from st1 to gp2 volumes. > > On Tue, Apr 7, 2020 at 12:28 AM Suman B N <sumannew...@gmail.com> wrote: > > > We have used st1 volumes and we never saw any issue. > > Yes, we are using m-series. Even t-series worked for us :D > > > > During those spikes, do you observe any background operations going on? > > Check server logs, controller logs. > > > > On Tue, Apr 7, 2020 at 12:49 PM Seva Feldman <sev...@ironsrc.com> wrote: > > > > > ST1 EBS fit only for sequential rights and reads. Once you have many > > > partitions on EBS it will be mostly random. > > > Interesting to monitor random vs sequential... > > > > > > We tested kafka on ST1 with 1xx partitions on each EBS and it was > > > constantly lagging. > > > > > > BR > > > > > > On Tue, Apr 7, 2020 at 10:06 AM Soumyajit Sahu < > soumyajit.s...@gmail.com > > > > > > wrote: > > > > > > > Our typical IOPS stays at ~10K write ops/min, but it goes to 37K > write > > > > ops/min (which is where AWS throttles). > > > > The spike in write ops isn't accompanied by any spike in write > > throughput > > > > or produce requests (except for the first few minutes of catch up). > The > > > > write ops spike stays up (persistently for an hour or two) until we > > stop > > > > the broker ec2 instance for about 30 mins and then start it back. > > > > > > > > @Liam, no, we are not using log compaction except for a few consumer > > > offset > > > > topics and config topic (for Kafka Connect), and schema registry > store. > > > > > > > > @Suman, are you using m5 or r5 instances. Recently, we migrated from > r5 > > > to > > > > m5, and I wonder if that has a hand in this. > > > > > > > > We have about 1000 partitions residing on each disk, but I don't > think > > > that > > > > matters as most of the time the brokers run flawlessly (even during > > peak > > > > traffic hours). > > > > > > > > Thanks! > > > > > > > > On Mon, Apr 6, 2020 at 11:39 PM Suman B N <sumannew...@gmail.com> > > wrote: > > > > > > > > > We too have a similar setup but we never observed any such spikes. > > > > > > > > > > Are you sure your disk IOPS is good enough? Check if that is > > > throttling. > > > > > > > > > > After a broker restarts, there might be more traffic as well > because > > of > > > > > followers trying to catch up with the leader. > > > > > > > > > > -Suman > > > > > > > > > > On Tue, Apr 7, 2020 at 11:59 AM Soumyajit Sahu < > > > soumyajit.s...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > We are running Kafka on AWS EC2 instances (m5.2xlarge) with > mounted > > > EBS > > > > > st1 > > > > > > volume (one on each machine). > > > > > > Occasionally, we have noticed that the write ops/second goes > > through > > > > the > > > > > > roof and we get throttled by AWS while the data throughput > wouldn't > > > > have > > > > > > changed much. As far as our observation goes, it happens usually > > > after > > > > a > > > > > > broker restart. > > > > > > > > > > > > Has anyone else come across this behavior? > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > -- > > > > > *Suman* > > > > > *OlaCabs* > > > > > > > > > > > > > > > > > > -- > > > Seva Feldman > > > VP R&D Mobile Delivery > > > [image: ironSource] <http://www.ironsrc.com/> > > > > > > email sev...@ironsrc.com > > > mobile +972544346089 > > > > > > ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv > > > > > > > > > -- > > *Suman* > > *OlaCabs* > > > -- Seva Feldman VP R&D Mobile Delivery [image: ironSource] <http://www.ironsrc.com/> email sev...@ironsrc.com mobile +972544346089 ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv