I suspect this is on Linux right? The way Linux works is it uses a percent of memory to buffer new writes, at a certain point it thinks it has too much buffered data and it gives high priority to writing that out. The good news about this is that the writes are very linear, well layed out, and high-throughput. The problem is that it leads to a bit of see-saw behavior.
Now obviously the drop in performance isn't wrong. When your disk is writing data out it is doing work and obviously the read throughput will be higher when you are just reading and not writing then when you are doing both reading and writing simultaneously. So obviously you can't get the no-writing performance when you are also writing (unless you add I/O capacity). But still these big see-saws in performance are not ideal. You'd rather have more constant performance all the time rather than have linux bounce back and forth from writing nothing and then frantically writing full bore. Fortunately linux provides a set of pagecache tuning parameters that let you control this a bit. I think these docs cover some of the parameters: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html -Jay On Thu, Jul 20, 2017 at 10:24 AM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi guys, > > I’m relatively new to Kafka’s world. I have an issue I describe below, > maybe you can help me understand this behaviour. > > I’m running a benchmark using the following setup: one producer sends data > to a topic and concurrently one consumer pulls and writes it to another > topic. > Measuring the consumer throughput, I observe values around 500K records/s > only until the system’s cache gets filled - from this moment the consumer > throughout drops to ~200K (2.5 times lower). > Looking at disk usage, I observe disk read I/O which corresponds to the > moment the consumer throughout drops. > After some time, I kill the producer and immediately I observe the > consumer throughput goes up to initial values ~ 500K records/s. > > What can I do to avoid this throughput drop? > > Attached an image showing disk I/O and CPU usage. I have about 128GB RAM > on that server which gets filled at time ~2300. > > Thanks, > Ovidiu > >