On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <s...@discovergy.com> wrote: > Paul, I have to correct you. On a production server you have usually a mix of > applications many times including databases. For databases, having a read > ahead means one IO less since usually database access patterns are random > reads. Here actually best is to disable completely read ahead. In fact, I do > have to say that probably best is to disable completely read ahead and let > applications deal with it, either in an automatic fashion, like reading the > optimal IO block size from device or in a configurable way with defaults > good enough for today's servers. If you now configure the OS to do a read > ahead hitting all HDDs then you induce potentially unnecessary IO load for > all applications which use it, which when having HDDs is totally > unacceptable. That's why the best is to be application specific and ideally > configured to use optimal IO block size. > > So no, letting OS to do it is stupid. > > On Wed, 1 Jan 2020 at 20:42, Paul Eggert <egg...@cs.ucla.edu> wrote: >> >> On 1/1/20 1:15 AM, Sergiu Hlihor wrote: >> > If you rely on OS, then >> > you are at the mercy of whatever read ahead configuration you have. >> >> Right, and whatever changes you make to the OS and its read-ahead >> configuration >> will work for all applications, not just for 'grep'. So, change the OS to do >> that. There shouldn't be a need to change 'grep' in particular (or 'cp' in >> particular, or 'awk' in particular, etc.). >> >> > The issue of large >> > block sizes for IO operations is widespread across all tools from Linux, >> > like rsync or cp and its only getting worse >> >> Quite right. And it would be painful to have to modify all those tools, and >> to >> maintain those modifications. So modify the OS instead. Scheduling >> read-ahead is >> really the OS's job anyway.
Hi Sergiu, If you would like to help make grep use larger buffer sizes, please run and report benchmarks measuring how much of a difference it would make, at least for your hardware. Here are some of the tests I ran to justify raising it from ~32k to ~96k: https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.html