As a quite serious question, how is someone writing user-level code supposed to be able to figure out the right buffer size for a particular file, and to do so portably? ("Show me the code.")
Gawk bases its reads on the st_blksize member in struct stat. That will typically be something like 4K - not nearly enough, given your description below. Arnold Sergiu Hlihor <s...@discovergy.com> wrote: > This topic is getting more and more frustrating. If you rely on OS, then > you are at the mercy of whatever read ahead configuration you have. And > read ahead is typically 128KB so does not help that much. A HDD RAID 10 > array with 12 disks and a strip size of 128KB reaches the maximum read > throughput if read block size is 6 * 128 = 768KB. When issuing read > requests with 128KB , you only hit one HDD, having 1/6 read throughput. > With flash the same. A state of the art SSD that can do 5GB/s reads can > actually do around 1GB/s or less at 128KB block size. Why is so hard to > understand how hardware works and the fact that you need huge block sizes > to actually read at full speed? Why not just exposing the read buffer size > as a configurable parameter, then anyone can just tune it as needed? 96KB > is purely retarded. > > On Wed, 1 Jan 2020 at 08:52, Paul Eggert <egg...@cs.ucla.edu> wrote: > > > > This makes me think we should follow Coreutils' lead[0] and increase > > > grep's initial buffer size from 32KiB, probably to 128KiB. > > > > I see that Jim later installed a patch increasing it to 96 KiB. > > > > Whatever number is chosen, it's "wrong" for some configuration. And I > > suppose > > the particular configuration that Sergiu Hlihor mentioned could be tweaked > > so > > that it worked better with grep (and with other programs). > > > > I'm inclined to mark this bug report as a wishlist item, in the sense that > > it'd > > be nice if grep and/or the OS could pick buffer sizes more intelligently > > (though > > it's not clear how grep and/or the OS could go about this). > >