On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:
"Sean C. Farley" <s...@freebsd.org> writes:
Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
Aho-Corasick is not really a search algorithm, but an algorithm for
constructing a table-driven finite state machine that will match
either of the search strings you fed it. I believe it is less
efficient than Boyer-Moore for small numbers of search terms, since it
scans the entire input. I don't see the point in using it in grep,
because grep already has an algorithm for constructing finite state
machines: regcomp(3).
especially those that could be useful for fgrep functionality
I was mainly talking about algorithms useful for the fgrep portion
within FreeGrep. fgrep would run (still runs?) over the same text for
each pattern.
Therefore, Aho–Corasick had to be mentioned for the reason referenced
within the link:
The Aho–Corasick string matching algorithm formed the basis of the
original Unix command fgrep.
2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
It doesn't seem to compare favorably to the far older Aho-Corasick.
It uses slightly less memory, but memory is usually not an issue with
grep.
I agree, yet I like to keep alternative algorithms in mind in case a
variant would be useful.
4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore
variant)
Glimpse is a POS... and not really comparable, because grep is
designed to search for a single search string in multiple texts, while
glimpse is designed to search a large amount of text over and over
with different search strings. I believe it uses suffix tables to
construct its index, and Boyer-Moore only to locate specific matches,
since the index lists only files, not exact positions. For anything
other than fixed strings, it reverts to agrep, but I assume (I haven't
looked at the code) that if the regexp has one or more fixed
components, it uses those to narrow the search space before running
agrep.
Glimpse may be a POS; I have not used it personally. I only noted its
algorithm for possible use within fgrep.
Of course, there may be much better algorithms out there to boost
fgrep's speed, but these are what I had found at one time.
Sean
--
s...@freebsd.org
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"