OK, thanks for the input.
Arnold
Sergiu Hlihor wrote:
> Hi Arnold,
> Annoying in the sense that you have to specify it with every usage. In a
> company where you have 10+ developers grepping over various logs, each one
> has to remember to add the extra parameter. Easier would be to have some
>
Hi Arnold,
Annoying in the sense that you have to specify it with every usage. In a
company where you have 10+ developers grepping over various logs, each one
has to remember to add the extra parameter. Easier would be to have some
kind of global configuration that the system admin can set and deve
Hi.
Sergiu Hlihor wrote:
> Hi Arnold,
> If AWKBUFSIZE translates to disk IO request size then it is already what
> its needed. However it's a little annoying.
How would you make it less annoying?
Thanks,
Arnold
Hi Arnold,
If AWKBUFSIZE translates to disk IO request size then it is already what
its needed. However it's a little annoying.
Regarding optimal settings, the benchmark actually tells you what is
optimal. Let's assume grep or any other tool can process in memory 3GB/s.
If your device can server 5
On Wed, Jan 1, 2020 at 5:04 PM Sergiu Hlihor wrote:
> The system for which this hurts me the most is an Ubuntu 14.04 where I'd need
> to run it as a separate binary. As I'm not familiar with the way it's built,
> is there any guidelines of how to build it from sources? I'd happy build it
> with
Hi Jim,
The system for which this hurts me the most is an Ubuntu 14.04 where I'd
need to run it as a separate binary. As I'm not familiar with the way it's
built, is there any guidelines of how to build it from sources? I'd happy
build it with ever larger block sizes and test.
On Thu, 2 Jan 2020 a
On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor wrote:
> Paul, I have to correct you. On a production server you have usually a mix of
> applications many times including databases. For databases, having a read
> ahead means one IO less since usually database access patterns are random
> reads. H
>From my old Unix fart view point, Paul (the other Paul)
is herding a hundred GNU cats, small command line utilities,
many of which date their origins back to the 1970's, many of
which have over the years grown their own internal i/o routines
with specific performance specializations, but few of wh
On 1/1/20 12:04 PM, Sergiu Hlihor wrote:
> That's why the best is to be application specific
That doesn't mean that one should have to modify every application. One could
instead modify the OS so that it uses different read-ahead heuristics for
different classes of applications. This should be ea
Hi.
Sergiu Hlihor wrote:
> Arnold, there is no need to write user code, it is already done in
> benchmarks. One of the standard benchmarks when testing HDDs and SSDs is
> read throughput vs block size and at different queue depths.
I think you're misunderstanding me, or I am misunderstanding yo
Paul, I have to correct you. On a production server you have usually a mix
of applications many times including databases. For databases, having a
read ahead means one IO less since usually database access patterns are
random reads. Here actually best is to disable completely read ahead. In
fact, I
On 1/1/20 1:15 AM, Sergiu Hlihor wrote:
> If you rely on OS, then
> you are at the mercy of whatever read ahead configuration you have.
Right, and whatever changes you make to the OS and its read-ahead configuration
will work for all applications, not just for 'grep'. So, change the OS to do
that.
Arnold, there is no need to write user code, it is already done in
benchmarks. One of the standard benchmarks when testing HDDs and SSDs is
read throughput vs block size and at different queue depths. Take a look
at this"
https://www.servethehome.com/wp-content/uploads/2019/12/Corsair-Force-MP600-
>> Why not just exposing the read buffer size as a configurable parameter ...
Take a look at the (and I quote) "Hairy buffering mechanism for grep"
input buffering code in the grep source file grep-3.3/src/grep.c, then
you tell me why it's not a runtime variable parameter .
In other words, the i
As a quite serious question, how is someone writing user-level code
supposed to be able to figure out the right buffer size for a particular
file, and to do so portably? ("Show me the code.")
Gawk bases its reads on the st_blksize member in struct stat. That will
typically be something like 4K -
This topic is getting more and more frustrating. If you rely on OS, then
you are at the mercy of whatever read ahead configuration you have. And
read ahead is typically 128KB so does not help that much. A HDD RAID 10
array with 12 disks and a strip size of 128KB reaches the maximum read
throughput
> This makes me think we should follow Coreutils' lead[0] and increase
> grep's initial buffer size from 32KiB, probably to 128KiB.
I see that Jim later installed a patch increasing it to 96 KiB.
Whatever number is chosen, it's "wrong" for some configuration. And I suppose
the particular configur
To add, the increase to 128KiB is good, but for RAID arrays with light to
medium load, this is not sufficient. In a system without any load, the HDD
can read ahead and always serve the next request from buffer thus reading
at full sequential speed of ~200MB/s . In a RAID 10 configuration with 12
hd
On Fri, Jul 6, 2018 at 9:26 AM, Sergiu Hlihor wrote:
> Hello,
> I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While
> grepping over large files I've noticed Grep is painfully slow. The
> bottleneck seems to be the read block which is extremely low (looks like
> 64KB). For large
On 07/06/2018 06:06 PM, Paul Eggert wrote:
Sergiu Hlihor wrote:
Given my tests for such scenarios, a read block size of at least
512KB would be way more efficient.
Does stdio do this already? If not, why not? How could grep reasonably
configure a good block size?
This seems to be a very spe
Sergiu Hlihor wrote:
Given my tests for such scenarios, a read block size of at least
512KB would be way more efficient.
Does stdio do this already? If not, why not? How could grep reasonably configure
a good block size?
Hello,
I'm using grep over Ubuntu Server 14.04 (Grep version 2.16). While
grepping over large files I've noticed Grep is painfully slow. The
bottleneck seems to be the read block which is extremely low (looks like
64KB). For large files residing over big HDD RAID arrays, this request
barely re
22 matches
Mail list logo