in github.com/andybalholm/redwood <http://github.com/andybalholm/redwood>, one 
thing I do is to check each URL against a (potentially very large) set of 
regular expressions. Since these regular expressions generally contain fairly 
significant amounts of literal text, I analyze each regular expression to see 
if I can find a set of strings such that every URL that matches that regular 
expression must contain at least one of those strings. (This is done in the 
file restring.go.) I combine the sets of strings from all the regular 
expressions, and do an Aho-Corasick string search based on that list of 
strings. From the results of the string search, I know which regexps are 
possible matches, and I test those against the URL.

This is all wrapped up in the regexMap type in url.go.

Andy

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to