Re: why GNU grep is fast

Sean C. Farley Sun, 22 Aug 2010 18:30:07 -0700

On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:

"Sean C. Farley" <[email protected]> writes:
Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
Aho-Corasick is not really a search algorithm, but an algorithm forconstructing a table-driven finite state machine that will matcheither of the search strings you fed it. I believe it is lessefficient than Boyer-Moore for small numbers of search terms, since itscans the entire input. I don't see the point in using it in grep,because grep already has an algorithm for constructing finite statemachines: regcomp(3).


especially those that could be useful for fgrep functionality

I was mainly talking about algorithms useful for the fgrep portionwithin FreeGrep. fgrep would run (still runs?) over the same text foreach pattern.

Therefore, Aho–Corasick had to be mentioned for the reason referencedwithin the link:

    The Aho–Corasick string matching algorithm formed the basis of the
    original Unix command fgrep.

2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
It doesn't seem to compare favorably to the far older Aho-Corasick.It uses slightly less memory, but memory is usually not an issue withgrep.

I agree, yet I like to keep alternative algorithms in mind in case avariant would be useful.

4. GLIMPSE:  http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore
variant)
Glimpse is a POS... and not really comparable, because grep isdesigned to search for a single search string in multiple texts, whileglimpse is designed to search a large amount of text over and overwith different search strings. I believe it uses suffix tables toconstruct its index, and Boyer-Moore only to locate specific matches,since the index lists only files, not exact positions. For anythingother than fixed strings, it reverts to agrep, but I assume (I haven'tlooked at the code) that if the regexp has one or more fixedcomponents, it uses those to narrow the search space before runningagrep.

Glimpse may be a POS; I have not used it personally. I only noted itsalgorithm for possible use within fgrep.

Of course, there may be much better algorithms out there to boostfgrep's speed, but these are what I had found at one time.


Sean
--
[email protected]

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[email protected]"

Re: why GNU grep is fast

Reply via email to