On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor <s...@discovergy.com> wrote: > Hello, > I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While > grepping over large files I've noticed Grep is painfully slow. The > bottleneck seems to be the read block which is extremely low (looks like > 64KB). For large files residing over big HDD RAID arrays, this request > barely reaches one drive and based on CPU usage, grep is idling more or > less. Given my tests for such scenarios, a read block size of at least > 512KB would be way more efficient. It's very likely that optimum would be > 1MB+. Also, such increase in buffer size would also benefit slightly SSDs > where maximum sequential throughput is usually achieved when reading at > 256KB+ block size. > If this is already possible in newer versions or configurable, I'd > appreciate some hints about the new version which contains or about the way > I can configure it to increase the read block size.
Thanks for raising the issue. This makes me think we should follow Coreutils' lead[0] and increase grep's initial buffer size from 32KiB, probably to 128KiB. I will time with the attached diff on a few systems. [0] https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=v8.22-103-g74ca6e84c
grep-bufsize-increase.diff
Description: Binary data