in github.com/andybalholm/redwood <http://github.com/andybalholm/redwood>, one thing I do is to check each URL against a (potentially very large) set of regular expressions. Since these regular expressions generally contain fairly significant amounts of literal text, I analyze each regular expression to see if I can find a set of strings such that every URL that matches that regular expression must contain at least one of those strings. (This is done in the file restring.go.) I combine the sets of strings from all the regular expressions, and do an Aho-Corasick string search based on that list of strings. From the results of the string search, I know which regexps are possible matches, and I test those against the URL.
This is all wrapped up in the regexMap type in url.go. Andy -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.