On Mon, Aug 23, 2010 at 5:04 PM, Gabor Kovesdan wrote:
> 4, We really need a good regex library. From the comments, it seems there's
> no such in the open source world. GNU libregex isn't efficient because GNU
> grep uses those workarounds that Mike kindly pointed out. Oniguruma was
> extremely sl
On Aug 23, 2010, at 9:04 AM, Gabor Kovesdan wrote:
> Hi all,
>
> there are some consequences that we can see from the grep case. Here I'd like
> to add a summary, which raises some questions. All comments are welcome.
>
> 1, When grep entered -CURRENT and bugs were found I immediately got kind
Hi all,
there are some consequences that we can see from the grep case. Here I'd
like to add a summary, which raises some questions. All comments are
welcome.
1, When grep entered -CURRENT and bugs were found I immediately got kind
bug reports and sharp criticism, as well. According to my u
Later on, he summarizes some of the existing implementations,
including comments about the Plan 9 implementation and his own RE2,
both of which efficiently handle international text (which seems to
be a major concern of Gabor's).
I believe Gabor is considering TRE for a good replacement re
Em 2010.08.21. 4:31, Mike Haertel escreveu:
Hi Gabor,
I am the original author of GNU grep. I am also a FreeBSD user,
although I live on -stable (and older) and rarely pay attention
to -current.
However, while searching the -current mailing list for an unrelated
reason, I stumbled across some
Mike Haertel writes:
> Dag-Erling Smørgrav writes:
> > You don't really need to "isolate the containing line" unless you have
> > an actual match, do you?
> Theoretically no. However, suppose the pattern was /foo.*blah/.
> The Boyer-Moore search will be for "blah", since that's the longest
> fix
"Sean C. Farley" writes:
> Dag-Erling Smørgrav writes:
> > Aho-Corasick is not really a search algorithm, but an algorithm for
> > constructing a table-driven finite state machine that will match
> > either of the search strings you fed it. I believe it is less
> > efficient than Boyer-Moore for
On Sun, 22 Aug 2010, Tim Kientzle wrote:
On Aug 22, 2010, at 9:30 AM, Sean C. Farley wrote:
On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:
Mike Haertel writes:
GNU grep uses the well-known Boyer-Moore algorithm, which looks
first for the final letter of the target string, and uses a lookup
On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:
"Sean C. Farley" writes:
Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
Aho-Corasick is not really a search algorithm, but an algorithm for
constructing a table-driven finite state machine that will mat
Dag-Erling Smørgrav writes:
> Mike Haertel writes:
> > For example if your regex is /foo.*bar/, the initial Boyer-Moore search
> > is (probably) searching for foo.
> >
> > If the initial search succeeds, GNU grep isolates the containing line,
> > and then runs the full regex matcher on that line
Mike Haertel writes:
> For example if your regex is /foo.*bar/, the initial Boyer-Moore search
> is (probably) searching for foo.
>
> If the initial search succeeds, GNU grep isolates the containing line,
> and then runs the full regex matcher on that line to make sure.
You don't really need to "
"Sean C. Farley" writes:
> Some algorithms:
> 1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
Aho-Corasick is not really a search algorithm, but an algorithm for
constructing a table-driven finite state machine that will match either
of the search strings you fed it. I bel
Dag-Erling Sm�rgrav writes:
> Mike Haertel writes:
> > GNU grep uses the well-known Boyer-Moore algorithm, which looks
> > first for the final letter of the target string, and uses a lookup
> > table to tell it how far ahead it can skip in the input whenever
> > it finds a non-matching character.
On Aug 22, 2010, at 9:30 AM, Sean C. Farley wrote:
> On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:
>> Mike Haertel writes:
>>> GNU grep uses the well-known Boyer-Moore algorithm, which looks first for
>>> the final letter of the target string, and uses a lookup table to tell it
>>> how far ahe
In article <20100822163644.gu2...@hoeg.nl> you write:
>I think that implementing a simple fgrep boils down to mmap()ing a file
>and calling memmem() on the mapping to search for the input string. Of
>course this relies on having an efficient memmem() implementation, for
>example using one of the a
On Aug 22, 2010, at 8:02 AM, Garrett Wollman wrote:
> In article <86k4nikglg@ds4.des.no> you write:
>> Mike Haertel writes:
>>> GNU grep uses the well-known Boyer-Moore algorithm, which looks
>>> first for the final letter of the target string, and uses a lookup
>>> table to tell it how far ah
* Mike Haertel wrote:
> Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking
> for newlines would slow grep down by a factor of several times,
> because to find the newlines it would have to look at every byte!
I think that implementing a simple fgrep boils down to mmap()ing a file
a
On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:
Mike Haertel writes:
GNU grep uses the well-known Boyer-Moore algorithm, which looks first
for the final letter of the target string, and uses a lookup table to
tell it how far ahead it can skip in the input whenever it finds a
non-matching cha
2010/8/22 Dag-Erling Smørgrav :
> Amen. The current bottleneck in BSD grep is the memchr() that looks for
> '\n' in the input buffer.
FYI I actually have a rewritten memchr() which is faster than the
current one here:
http://people.freebsd.org/~delphij/for_review/memchr.c
Review/comments welcom
In article <86k4nikglg@ds4.des.no> you write:
>Mike Haertel writes:
>> GNU grep uses the well-known Boyer-Moore algorithm, which looks
>> first for the final letter of the target string, and uses a lookup
>> table to tell it how far ahead it can skip in the input whenever
>> it finds a non-mat
Mike Haertel writes:
> GNU grep uses the well-known Boyer-Moore algorithm, which looks
> first for the final letter of the target string, and uses a lookup
> table to tell it how far ahead it can skip in the input whenever
> it finds a non-matching character.
Boyer-Moore is for fixed search strin
That's a good read for other things as Mike, thanks for taking the
time to pass on this knowledge :)
- Original Message -
From: "Mike Haertel"
To:
Anyway, just FYI, here's a quick summary of where GNU grep gets
its speed. Hopefully you can carry these ideas over to BSD grep.
#1 tr
Hi Gabor,
I am the original author of GNU grep. I am also a FreeBSD user,
although I live on -stable (and older) and rarely pay attention
to -current.
However, while searching the -current mailing list for an unrelated
reason, I stumbled across some flamage regarding BSD grep vs GNU grep
perform
23 matches
Mail list logo