On 24 Jan 2007 11:07:49 -0800, Paul McGuire <[EMAIL PROTECTED]> wrote: > On Jan 24, 10:20 am, "Johny" <[EMAIL PROTECTED]> wrote: > > Does anyone know about a good regular expression for URL extracting? > > > > J. > Google turns this up: > > http://geekswithblogs.net/casualjim/archive/2005/12/01/61722.aspx > > But I've seen other re's for this problem that are hundreds of > characters long. > > -- Paul > > -- > http://mail.python.org/mailman/listinfo/python-list >
These are the regexps that gnome-terminal uses for it's URL auto-recognition, and I have shamelessly stolen them for use in one of my own apps: urlfinders = [ re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?/[-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]*[^]'\\.}>\\),\\\"]"), re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?"), re.compile("(~/|/|\\./)([-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]|\\\\ )+"), re.compile("'\\<((mailto:)|)[EMAIL PROTECTED]"), ] -- http://mail.python.org/mailman/listinfo/python-list