On Wed, Jul 27, 2016 at 9:42 AM, Andy Balholm <andybalh...@gmail.com> wrote:
> in github.com/andybalholm/redwood, one thing I do is to check each URL > against a (potentially very large) set of regular expressions. Since these > regular expressions generally contain fairly significant amounts of literal > text, I analyze each regular expression to see if I can find a set of > strings such that every URL that matches that regular expression must > contain at least one of those strings. (This is done in the file > restring.go.) I combine the sets of strings from all the regular > expressions, and do an Aho-Corasick string search based on that list of > strings. From the results of the string search, I know which regexps are > possible matches, and I test those against the URL. > By coincidence, I solved this exact problem as well earlier this week, and my code is almost the same as yours (and rsc's) -- especially the deconstruction of the syntax.Regex AST. I used Rabin-Karp for the search. Thanks for the pointer to Aho-Corasick; that's a very nice algorithm. -Caleb -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.