bug#32073: Improvements in Grep

Sergiu Hlihor Fri, 06 Jul 2018 18:40:31 -0700

To add, the increase to 128KiB is good, but for RAID arrays with light to
medium load, this is not sufficient. In a system without any load, the HDD
can read ahead and always serve the next request from buffer thus reading
at full sequential speed of ~200MB/s . In a RAID 10 configuration with 12
hdds where strip size is set to 128KB, every HDD is hit at every 6th
request. There is enough delay between reads hitting the same drive that
the read ahead buffer often gets discarded which basically limits the
throughput to max IOPS x buffer size  = ~10-20MiB for 128KiB.
I have such systems in production environments and I often see read speeds
under 10MiB and read await >10ms which means that read ahead buffer is
already discarded. At the same load conditions, if I read the data using
utilities which can do 512KiB buffer size, I see read speed varying between
50 and 400MiB. Grep has an average CPU load of 2-3% of the given machine
under such low reads, therefore it can do much more if reading is optimized.


On 7 July 2018 at 02:33, Jim Meyering <[email protected]> wrote:

> On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor <[email protected]> wrote:
> > Hello,
> >      I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While
> > grepping over large files I've noticed Grep is painfully slow. The
> > bottleneck seems to be the read block which is extremely low (looks like
> > 64KB). For large files residing over big HDD RAID arrays, this request
> > barely reaches one drive and based on CPU usage, grep is idling more or
> > less. Given my tests for such scenarios, a read block size of at least
> > 512KB would be way more efficient. It's very likely that optimum would be
> > 1MB+. Also, such increase in buffer size would also benefit slightly SSDs
> > where maximum sequential throughput is usually achieved when reading at
> > 256KB+ block size.
> >      If this is already possible in newer versions or configurable, I'd
> > appreciate some hints about the new version which contains or about the
> way
> > I can configure it to increase the read block size.
>
> Thanks for raising the issue.
> This makes me think we should follow Coreutils' lead[0] and increase
> grep's initial buffer size from 32KiB, probably to 128KiB. I will time
> with the attached diff on a few systems.
>
> [0] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=
> v8.22-103-g74ca6e84c
>



-- 
_____________________________________________

Senior Software Engineer & Team leader

Telefon: +49 (0) 6221 7787-481

Email: [email protected]

*Discovergy GmbH*
_____________________________________________

Registergericht: Amtsgericht Aachen HRB 15391

Geschäftsführer: Ralf Esser | Bernhard Seidl | Nikolaus Starzacher
Diese E-Mail und eventuell angehängte Dateien sind nur für den oben
genannten Empfänger bestimmt und können vertrauliche Informationen
enthalten. Sollten Sie nicht der Empfänger sein, ist jede Verbreitung,
Weiterleitung und Kopie verboten. Wenn Sie diese E-Mail versehentlich
erhalten haben, senden Sie diese Mail zurück oder unterrichten umgehend den
Absender unter oben genannten Kontaktdaten. Bitte löschen Sie diese
Nachricht in diesem Fall umgehend. Vielen Dank.

bug#32073: Improvements in Grep

Reply via email to