On Wed, Jul 27, 2016 at 9:42 AM, Andy Balholm <andybalh...@gmail.com> wrote:

> in github.com/andybalholm/redwood, one thing I do is to check each URL
> against a (potentially very large) set of regular expressions. Since these
> regular expressions generally contain fairly significant amounts of literal
> text, I analyze each regular expression to see if I can find a set of
> strings such that every URL that matches that regular expression must
> contain at least one of those strings. (This is done in the file
> restring.go.) I combine the sets of strings from all the regular
> expressions, and do an Aho-Corasick string search based on that list of
> strings. From the results of the string search, I know which regexps are
> possible matches, and I test those against the URL.
>

​By coincidence, I solved this exact problem as well earlier this week​,
and my code is almost the same as yours (and rsc's) -- especially the
deconstruction of the syntax.Regex AST.

I used Rabin-Karp for the search. Thanks for the pointer to Aho-Corasick;
that's a very nice algorithm.

-Caleb

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to