In article <86k4nikglg....@ds4.des.no> you write: >Mike Haertel <m...@ducky.net> writes: >> GNU grep uses the well-known Boyer-Moore algorithm, which looks >> first for the final letter of the target string, and uses a lookup >> table to tell it how far ahead it can skip in the input whenever >> it finds a non-matching character. > >Boyer-Moore is for fixed search strings. I don't see how that >optimization can work with a regexp search unless the regexp is so >simple that you break it down into a small number of cases with known >length and final character.
The common case of regexps used in the "grep" utility (and, for obvious reasons, universal in the "fgrep" utility) is fixed-length search strings. Even non-fixed-length regexps typically consist of one one or two variable-length parts. Matching a completely variable-length regexp is just hard, computationally, so it's OK for it to be slower. There are other tricks you can do, such as turning the anchors ^ and $ into explicit newlines in your search -- "^foo" is a very common regexp to search for, and it's really a fixed-string search for "\nfoo" which is entirely amenable to the B-M treatment. You just have to remember that a matched newline isn't part of the result. The GNU regexp library also uses the Boyer-Moore (or is it Boyer-Moore-Gosper?) strategy. -GAWollman _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"