Hi Soumyajit, It is possible that due to the broker restart, you benefit from less I/O merges than under steady state. Intuitively, that would come from a shift from sequential workload with one more dispersed in nature. It is likely your broker generates more disk read than before the restart, especially if lots of page were written back and/or released during the broker bounce.
What would be interesting to know is what the throughput is on the device (read and write, steady state and at IOPS burst)? I refer to the actual traffic on the disk - not the read/write at the file system level. Thanks, Alexandre Le mar. 7 avr. 2020 à 08:42, Seva Feldman <sev...@ironsrc.com> a écrit : > > We are using mainly ephemeral instances like i3en as our pattern is more > fit for it. > > On Tue, Apr 7, 2020 at 10:40 AM Soumyajit Sahu <soumyajit.s...@gmail.com> > wrote: > > > @Suman, thanks for confirming. I will dig more then. The instances are > > dedicated to running Kafka, and so is the mounted volume. > > > > @Seva, thanks for the insight. I guess if nothing works, then we will move > > from st1 to gp2 volumes. > > > > On Tue, Apr 7, 2020 at 12:28 AM Suman B N <sumannew...@gmail.com> wrote: > > > > > We have used st1 volumes and we never saw any issue. > > > Yes, we are using m-series. Even t-series worked for us :D > > > > > > During those spikes, do you observe any background operations going on? > > > Check server logs, controller logs. > > > > > > On Tue, Apr 7, 2020 at 12:49 PM Seva Feldman <sev...@ironsrc.com> wrote: > > > > > > > ST1 EBS fit only for sequential rights and reads. Once you have many > > > > partitions on EBS it will be mostly random. > > > > Interesting to monitor random vs sequential... > > > > > > > > We tested kafka on ST1 with 1xx partitions on each EBS and it was > > > > constantly lagging. > > > > > > > > BR > > > > > > > > On Tue, Apr 7, 2020 at 10:06 AM Soumyajit Sahu < > > soumyajit.s...@gmail.com > > > > > > > > wrote: > > > > > > > > > Our typical IOPS stays at ~10K write ops/min, but it goes to 37K > > write > > > > > ops/min (which is where AWS throttles). > > > > > The spike in write ops isn't accompanied by any spike in write > > > throughput > > > > > or produce requests (except for the first few minutes of catch up). > > The > > > > > write ops spike stays up (persistently for an hour or two) until we > > > stop > > > > > the broker ec2 instance for about 30 mins and then start it back. > > > > > > > > > > @Liam, no, we are not using log compaction except for a few consumer > > > > offset > > > > > topics and config topic (for Kafka Connect), and schema registry > > store. > > > > > > > > > > @Suman, are you using m5 or r5 instances. Recently, we migrated from > > r5 > > > > to > > > > > m5, and I wonder if that has a hand in this. > > > > > > > > > > We have about 1000 partitions residing on each disk, but I don't > > think > > > > that > > > > > matters as most of the time the brokers run flawlessly (even during > > > peak > > > > > traffic hours). > > > > > > > > > > Thanks! > > > > > > > > > > On Mon, Apr 6, 2020 at 11:39 PM Suman B N <sumannew...@gmail.com> > > > wrote: > > > > > > > > > > > We too have a similar setup but we never observed any such spikes. > > > > > > > > > > > > Are you sure your disk IOPS is good enough? Check if that is > > > > throttling. > > > > > > > > > > > > After a broker restarts, there might be more traffic as well > > because > > > of > > > > > > followers trying to catch up with the leader. > > > > > > > > > > > > -Suman > > > > > > > > > > > > On Tue, Apr 7, 2020 at 11:59 AM Soumyajit Sahu < > > > > soumyajit.s...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > We are running Kafka on AWS EC2 instances (m5.2xlarge) with > > mounted > > > > EBS > > > > > > st1 > > > > > > > volume (one on each machine). > > > > > > > Occasionally, we have noticed that the write ops/second goes > > > through > > > > > the > > > > > > > roof and we get throttled by AWS while the data throughput > > wouldn't > > > > > have > > > > > > > changed much. As far as our observation goes, it happens usually > > > > after > > > > > a > > > > > > > broker restart. > > > > > > > > > > > > > > Has anyone else come across this behavior? > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > *Suman* > > > > > > *OlaCabs* > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Seva Feldman > > > > VP R&D Mobile Delivery > > > > [image: ironSource] <http://www.ironsrc.com/> > > > > > > > > email sev...@ironsrc.com > > > > mobile +972544346089 > > > > > > > > ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv > > > > > > > > > > > > > -- > > > *Suman* > > > *OlaCabs* > > > > > > > > -- > Seva Feldman > VP R&D Mobile Delivery > [image: ironSource] <http://www.ironsrc.com/> > > email sev...@ironsrc.com > mobile +972544346089 > > ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv