To add, the increase to 128KiB is good, but for RAID arrays with light to medium load, this is not sufficient. In a system without any load, the HDD can read ahead and always serve the next request from buffer thus reading at full sequential speed of ~200MB/s . In a RAID 10 configuration with 12 hdds where strip size is set to 128KB, every HDD is hit at every 6th request. There is enough delay between reads hitting the same drive that the read ahead buffer often gets discarded which basically limits the throughput to max IOPS x buffer size = ~10-20MiB for 128KiB. I have such systems in production environments and I often see read speeds under 10MiB and read await >10ms which means that read ahead buffer is already discarded. At the same load conditions, if I read the data using utilities which can do 512KiB buffer size, I see read speed varying between 50 and 400MiB. Grep has an average CPU load of 2-3% of the given machine under such low reads, therefore it can do much more if reading is optimized.
On 7 July 2018 at 02:33, Jim Meyering <j...@meyering.net> wrote: > On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor <s...@discovergy.com> wrote: > > Hello, > > I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While > > grepping over large files I've noticed Grep is painfully slow. The > > bottleneck seems to be the read block which is extremely low (looks like > > 64KB). For large files residing over big HDD RAID arrays, this request > > barely reaches one drive and based on CPU usage, grep is idling more or > > less. Given my tests for such scenarios, a read block size of at least > > 512KB would be way more efficient. It's very likely that optimum would be > > 1MB+. Also, such increase in buffer size would also benefit slightly SSDs > > where maximum sequential throughput is usually achieved when reading at > > 256KB+ block size. > > If this is already possible in newer versions or configurable, I'd > > appreciate some hints about the new version which contains or about the > way > > I can configure it to increase the read block size. > > Thanks for raising the issue. > This makes me think we should follow Coreutils' lead[0] and increase > grep's initial buffer size from 32KiB, probably to 128KiB. I will time > with the attached diff on a few systems. > > [0] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id= > v8.22-103-g74ca6e84c > -- _____________________________________________ Senior Software Engineer & Team leader Telefon: +49 (0) 6221 7787-481 Email: s...@discovergy.com *Discovergy GmbH* _____________________________________________ Registergericht: Amtsgericht Aachen HRB 15391 Geschäftsführer: Ralf Esser | Bernhard Seidl | Nikolaus Starzacher Diese E-Mail und eventuell angehängte Dateien sind nur für den oben genannten Empfänger bestimmt und können vertrauliche Informationen enthalten. Sollten Sie nicht der Empfänger sein, ist jede Verbreitung, Weiterleitung und Kopie verboten. Wenn Sie diese E-Mail versehentlich erhalten haben, senden Sie diese Mail zurück oder unterrichten umgehend den Absender unter oben genannten Kontaktdaten. Bitte löschen Sie diese Nachricht in diesem Fall umgehend. Vielen Dank.