After some tuning, I got better results. What I changed, as suggested: dirty_ratio = 10 (previously 20) dirty_background_ratio=3 (previously 10)
It results that disk read I/O is almost completely 0 (I have enough cache, the consumer is keeping with the producer). - producer throughput remains constant ~ 400K/s; - consumer throughput (a Flink app) stays in this interval [300K/s, 500K/s] even when the cache is filled (there are some variations but are not influenced by system’s cache); I don’t know if Kafka’s documentation is saying something, but this could be put somewhere in documentation if you also reproduce my tests and consider it useful. Thanks, Ovidiu > On 21 Jul 2017, at 01:57, Apurva Mehta <apu...@confluent.io> wrote: > > Hi Ovidu, > > The see-saw behavior is inevitable with linux when you have concurrent reads > and writes. However, tuning the following two settings may help achieve more > stable performance (from Jay's link): > > dirty_ratio > Defines a percentage value. Writeout of dirty data begins (via pdflush) when > dirty data comprises this percentage of total system memory. The default > value is 20. > Red Hat recommends a slightly lower value of 15 for database workloads. > > dirty_background_ratio > Defines a percentage value. Writeout of dirty data begins in the background > (via pdflush) when dirty data comprises this percentage of total memory. The > default value is 10. For database workloads, Red Hat recommends a lower value > of 3. > > Thanks, > Apurva > > > On Thu, Jul 20, 2017 at 12:25 PM, Ovidiu-Cristian MARCU > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Yes, I’m using Debian Jessie 2.6 installed on this hardware [1]. > > It is also my understanding that Kafka is based on system’s cache (Linux in > this case) which is based on Clock-Pro for page replacement policy, doing > complex things for general workloads. I will check the tuning parameters, but > I was hoping for some advices to avoid disk at all when reading, considering > the system's cache is used completely by Kafka and is huge ~128GB - that is > to tune Clock-Pro to be smarter when used for streaming access patterns. > > Thanks, > Ovidiu > > [1] > https://www.grid5000.fr/mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29 > > <https://www.grid5000.fr/mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29> > > <https://www.grid5000.fr/mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29 > > <https://www.grid5000.fr/mediawiki/index.php/Rennes:Hardware#Dell_Poweredge_R630_.28paravance.29>> > > > On 20 Jul 2017, at 21:06, Jay Kreps <j...@confluent.io > > <mailto:j...@confluent.io>> wrote: > > > > I suspect this is on Linux right? > > > > The way Linux works is it uses a percent of memory to buffer new writes, at > > a certain point it thinks it has too much buffered data and it gives high > > priority to writing that out. The good news about this is that the writes > > are very linear, well layed out, and high-throughput. The problem is that > > it leads to a bit of see-saw behavior. > > > > Now obviously the drop in performance isn't wrong. When your disk is > > writing data out it is doing work and obviously the read throughput will be > > higher when you are just reading and not writing then when you are doing > > both reading and writing simultaneously. So obviously you can't get the > > no-writing performance when you are also writing (unless you add I/O > > capacity). > > > > But still these big see-saws in performance are not ideal. You'd rather > > have more constant performance all the time rather than have linux bounce > > back and forth from writing nothing and then frantically writing full bore. > > Fortunately linux provides a set of pagecache tuning parameters that let > > you control this a bit. > > > > I think these docs cover some of the parameters: > > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html > > > > <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html> > > > > <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html > > > > <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html>> > > > > -Jay > > > > On Thu, Jul 20, 2017 at 10:24 AM, Ovidiu-Cristian MARCU > > <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr> > > <mailto:ovidiu-cristian.ma...@inria.fr > > <mailto:ovidiu-cristian.ma...@inria.fr>>> wrote: > > Hi guys, > > > > I’m relatively new to Kafka’s world. I have an issue I describe below, > > maybe you can help me understand this behaviour. > > > > I’m running a benchmark using the following setup: one producer sends data > > to a topic and concurrently one consumer pulls and writes it to another > > topic. > > Measuring the consumer throughput, I observe values around 500K records/s > > only until the system’s cache gets filled - from this moment the consumer > > throughout drops to ~200K (2.5 times lower). > > Looking at disk usage, I observe disk read I/O which corresponds to the > > moment the consumer throughout drops. > > After some time, I kill the producer and immediately I observe the consumer > > throughput goes up to initial values ~ 500K records/s. > > > > What can I do to avoid this throughput drop? > > > > Attached an image showing disk I/O and CPU usage. I have about 128GB RAM on > > that server which gets filled at time ~2300. > > > > Thanks, > > Ovidiu > > > > <consumer-throughput-drops.png> > > > >