bug#32073: Improvements in Grep (Bug#32073)

2020-01-02 Thread arnold
OK, thanks for the input. Arnold Sergiu Hlihor wrote: > Hi Arnold, > Annoying in the sense that you have to specify it with every usage. In a > company where you have 10+ developers grepping over various logs, each one > has to remember to add the extra parameter. Easier would be to have some >

bug#32073: Improvements in Grep (Bug#32073)

2020-01-02 Thread Sergiu Hlihor
Hi Arnold, Annoying in the sense that you have to specify it with every usage. In a company where you have 10+ developers grepping over various logs, each one has to remember to add the extra parameter. Easier would be to have some kind of global configuration that the system admin can set and deve

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread arnold
Hi. Sergiu Hlihor wrote: > Hi Arnold, > If AWKBUFSIZE translates to disk IO request size then it is already what > its needed. However it's a little annoying. How would you make it less annoying? Thanks, Arnold

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Hi Arnold, If AWKBUFSIZE translates to disk IO request size then it is already what its needed. However it's a little annoying. Regarding optimal settings, the benchmark actually tells you what is optimal. Let's assume grep or any other tool can process in memory 3GB/s. If your device can server 5

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Jim Meyering
On Wed, Jan 1, 2020 at 5:04 PM Sergiu Hlihor wrote: > The system for which this hurts me the most is an Ubuntu 14.04 where I'd need > to run it as a separate binary. As I'm not familiar with the way it's built, > is there any guidelines of how to build it from sources? I'd happy build it > with

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Hi Jim, The system for which this hurts me the most is an Ubuntu 14.04 where I'd need to run it as a separate binary. As I'm not familiar with the way it's built, is there any guidelines of how to build it from sources? I'd happy build it with ever larger block sizes and test. On Thu, 2 Jan 2020 a

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Jim Meyering
On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor wrote: > Paul, I have to correct you. On a production server you have usually a mix of > applications many times including databases. For databases, having a read > ahead means one IO less since usually database access patterns are random > reads. H

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Jackson
>From my old Unix fart view point, Paul (the other Paul) is herding a hundred GNU cats, small command line utilities, many of which date their origins back to the 1970's, many of which have over the years grown their own internal i/o routines with specific performance specializations, but few of wh

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Eggert
On 1/1/20 12:04 PM, Sergiu Hlihor wrote: > That's why the best is to be application specific That doesn't mean that one should have to modify every application. One could instead modify the OS so that it uses different read-ahead heuristics for different classes of applications. This should be ea

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread arnold
Hi. Sergiu Hlihor wrote: > Arnold, there is no need to write user code, it is already done in > benchmarks. One of the standard benchmarks when testing HDDs and SSDs is > read throughput vs block size and at different queue depths. I think you're misunderstanding me, or I am misunderstanding yo

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Paul, I have to correct you. On a production server you have usually a mix of applications many times including databases. For databases, having a read ahead means one IO less since usually database access patterns are random reads. Here actually best is to disable completely read ahead. In fact, I

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Eggert
On 1/1/20 1:15 AM, Sergiu Hlihor wrote: > If you rely on OS, then > you are at the mercy of whatever read ahead configuration you have. Right, and whatever changes you make to the OS and its read-ahead configuration will work for all applications, not just for 'grep'. So, change the OS to do that.

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
Arnold, there is no need to write user code, it is already done in benchmarks. One of the standard benchmarks when testing HDDs and SSDs is read throughput vs block size and at different queue depths. Take a look at this" https://www.servethehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Paul Jackson
>> Why not just exposing the read buffer size as a configurable parameter ... Take a look at the (and I quote) "Hairy buffering mechanism for grep" input buffering code in the grep source file grep-3.3/src/grep.c, then you tell me why it's not a runtime variable parameter . In other words, the i

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread arnold
As a quite serious question, how is someone writing user-level code supposed to be able to figure out the right buffer size for a particular file, and to do so portably? ("Show me the code.") Gawk bases its reads on the st_blksize member in struct stat. That will typically be something like 4K -

bug#32073: Improvements in Grep (Bug#32073)

2020-01-01 Thread Sergiu Hlihor
This topic is getting more and more frustrating. If you rely on OS, then you are at the mercy of whatever read ahead configuration you have. And read ahead is typically 128KB so does not help that much. A HDD RAID 10 array with 12 disks and a strip size of 128KB reaches the maximum read throughput

bug#32073: Improvements in Grep (Bug#32073)

2019-12-31 Thread Paul Eggert
> This makes me think we should follow Coreutils' lead[0] and increase > grep's initial buffer size from 32KiB, probably to 128KiB. I see that Jim later installed a patch increasing it to 96 KiB. Whatever number is chosen, it's "wrong" for some configuration. And I suppose the particular configur

bug#32073: Improvements in Grep

2018-07-06 Thread Sergiu Hlihor
To add, the increase to 128KiB is good, but for RAID arrays with light to medium load, this is not sufficient. In a system without any load, the HDD can read ahead and always serve the next request from buffer thus reading at full sequential speed of ~200MB/s . In a RAID 10 configuration with 12 hd

bug#32073: Improvements in Grep

2018-07-06 Thread Jim Meyering
On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor wrote: > Hello, > I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While > grepping over large files I've noticed Grep is painfully slow. The > bottleneck seems to be the read block which is extremely low (looks like > 64KB). For large

bug#32073: Improvements in Grep

2018-07-06 Thread Dennis Clarke
On 07/06/2018 06:06 PM, Paul Eggert wrote: Sergiu Hlihor wrote: Given my tests for such scenarios, a read block size of at least 512KB would be way more efficient. Does stdio do this already? If not, why not? How could grep reasonably configure a good block size? This seems to be a very spe

bug#32073: Improvements in Grep

2018-07-06 Thread Paul Eggert
Sergiu Hlihor wrote: Given my tests for such scenarios, a read block size of at least 512KB would be way more efficient. Does stdio do this already? If not, why not? How could grep reasonably configure a good block size?

bug#32073: Improvements in Grep

2018-07-06 Thread Sergiu Hlihor
Hello, I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While grepping over large files I've noticed Grep is painfully slow. The bottleneck seems to be the read block which is extremely low (looks like 64KB). For large files residing over big HDD RAID arrays, this request barely re