bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
This topic is getting more and more frustrating. If you rely on OS, then you are at the mercy of whatever read ahead configuration you have. And read ahead is typically 128KB so does not help that much. A HDD RAID 10 array with 12 disks and a strip size of 128KB reaches the maximum read throughput

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread arnold
As a quite serious question, how is someone writing user-level code supposed to be able to figure out the right buffer size for a particular file, and to do so portably? ("Show me the code.") Gawk bases its reads on the st_blksize member in struct stat. That will typically be something like 4K -

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Jackson
>> Why not just exposing the read buffer size as a configurable parameter ... Take a look at the (and I quote) "Hairy buffering mechanism for grep" input buffering code in the grep source file grep-3.3/src/grep.c, then you tell me why it's not a runtime variable parameter . In other words, the i

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Arnold, there is no need to write user code, it is already done in benchmarks. One of the standard benchmarks when testing HDDs and SSDs is read throughput vs block size and at different queue depths. Take a look at this" https://www.servethehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Eggert
On 1/1/20 1:15 AM, Sergiu Hlihor wrote: > If you rely on OS, then > you are at the mercy of whatever read ahead configuration you have. Right, and whatever changes you make to the OS and its read-ahead configuration will work for all applications, not just for 'grep'. So, change the OS to do that.

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Paul, I have to correct you. On a production server you have usually a mix of applications many times including databases. For databases, having a read ahead means one IO less since usually database access patterns are random reads. Here actually best is to disable completely read ahead. In fact, I

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread arnold
Hi. Sergiu Hlihor wrote: > Arnold, there is no need to write user code, it is already done in > benchmarks. One of the standard benchmarks when testing HDDs and SSDs is > read throughput vs block size and at different queue depths. I think you're misunderstanding me, or I am misunderstanding yo

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Eggert
On 1/1/20 12:04 PM, Sergiu Hlihor wrote: > That's why the best is to be application specific That doesn't mean that one should have to modify every application. One could instead modify the OS so that it uses different read-ahead heuristics for different classes of applications. This should be ea

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Jackson
>From my old Unix fart view point, Paul (the other Paul) is herding a hundred GNU cats, small command line utilities, many of which date their origins back to the 1970's, many of which have over the years grown their own internal i/o routines with specific performance specializations, but few of wh

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Jim Meyering
On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor wrote: > Paul, I have to correct you. On a production server you have usually a mix of > applications many times including databases. For databases, having a read > ahead means one IO less since usually database access patterns are random > reads. H

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Hi Jim, The system for which this hurts me the most is an Ubuntu 14.04 where I'd need to run it as a separate binary. As I'm not familiar with the way it's built, is there any guidelines of how to build it from sources? I'd happy build it with ever larger block sizes and test. On Thu, 2 Jan 2020 a

bug#32943: grep pattern < fifo fails on Cygwin (and MinGW), but not on Linux

2020-01-01 Thread Paul Eggert
On 10/8/18 11:10 AM, Houder wrote: We have to wait until Corinna Vinschen returns from holidays (end of october), because Corinna Vinschen is the only one who can deal with the executive. Corinna later pushed a patch for this Cygwin bug so I

bug#33218: Updated dfa.c: unused function charclass_context

2020-01-01 Thread Paul Eggert
Bug#33218 dated 2018-10-31 seems to have been fixed by Jim Meyering in Gnulib commit 95cd86dd7aa4425037b9c710f88fd59e38601ff1 (2018-12-15 18:09:35 UTC) and then fixed in a different way since then, so I'm closing this old bug report.

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Jim Meyering
On Wed, Jan 1, 2020 at 5:04 PM Sergiu Hlihor wrote: > The system for which this hurts me the most is an Ubuntu 14.04 where I'd need > to run it as a separate binary. As I'm not familiar with the way it's built, > is there any guidelines of how to build it from sources? I'd happy build it > with

bug#33291: "blank" character class not documented

2020-01-01 Thread Paul Eggert
On 11/6/18 8:40 AM, Steven Penny wrote: GNU Grep supports the "[:blank:]" class as well - but is not documented: http://git.savannah.gnu.org/cgit/grep.git/tree/doc/grep.in.1?id=30e666c#n796 Thanks for reporting this mismatch between the main documentation and the man page. I installed the at

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Hi Arnold, If AWKBUFSIZE translates to disk IO request size then it is already what its needed. However it's a little annoying. Regarding optimal settings, the benchmark actually tells you what is optimal. Let's assume grep or any other tool can process in memory 3GB/s. If your device can server 5

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread arnold
Hi. Sergiu Hlihor wrote: > Hi Arnold, > If AWKBUFSIZE translates to disk IO request size then it is already what > its needed. However it's a little annoying. How would you make it less annoying? Thanks, Arnold