Yes, I’m using Debian Jessie 2.6 installed on this hardware [1]. It is also my understanding that Kafka is based on system’s cache (Linux in this case) which is based on Clock-Pro for page replacement policy, doing complex things for general workloads. I will check the tuning parameters, but I was hoping for some advices to avoid disk at all when reading, considering the system's cache is used completely by Kafka and is huge ~128GB - that is to tune Clock-Pro to be smarter when used for streaming access patterns.
Thanks, Ovidiu [1] https://www.grid5000.fr/mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29 <https://www.grid5000.fr/mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29> > On 20 Jul 2017, at 21:06, Jay Kreps <j...@confluent.io> wrote: > > I suspect this is on Linux right? > > The way Linux works is it uses a percent of memory to buffer new writes, at a > certain point it thinks it has too much buffered data and it gives high > priority to writing that out. The good news about this is that the writes are > very linear, well layed out, and high-throughput. The problem is that it > leads to a bit of see-saw behavior. > > Now obviously the drop in performance isn't wrong. When your disk is writing > data out it is doing work and obviously the read throughput will be higher > when you are just reading and not writing then when you are doing both > reading and writing simultaneously. So obviously you can't get the no-writing > performance when you are also writing (unless you add I/O capacity). > > But still these big see-saws in performance are not ideal. You'd rather have > more constant performance all the time rather than have linux bounce back and > forth from writing nothing and then frantically writing full bore. > Fortunately linux provides a set of pagecache tuning parameters that let you > control this a bit. > > I think these docs cover some of the parameters: > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html > > <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html> > > -Jay > > On Thu, Jul 20, 2017 at 10:24 AM, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi guys, > > I’m relatively new to Kafka’s world. I have an issue I describe below, maybe > you can help me understand this behaviour. > > I’m running a benchmark using the following setup: one producer sends data to > a topic and concurrently one consumer pulls and writes it to another topic. > Measuring the consumer throughput, I observe values around 500K records/s > only until the system’s cache gets filled - from this moment the consumer > throughout drops to ~200K (2.5 times lower). > Looking at disk usage, I observe disk read I/O which corresponds to the > moment the consumer throughout drops. > After some time, I kill the producer and immediately I observe the consumer > throughput goes up to initial values ~ 500K records/s. > > What can I do to avoid this throughput drop? > > Attached an image showing disk I/O and CPU usage. I have about 128GB RAM on > that server which gets filled at time ~2300. > > Thanks, > Ovidiu > > <consumer-throughput-drops.png> >