On 24 Jan 2007 11:07:49 -0800, Paul McGuire <[EMAIL PROTECTED]> wrote:
> On Jan 24, 10:20 am, "Johny" <[EMAIL PROTECTED]> wrote:
> > Does anyone know about a good regular expression  for URL extracting?
> >
> > J.
> Google turns this up:
>
> http://geekswithblogs.net/casualjim/archive/2005/12/01/61722.aspx
>
> But I've seen other re's for this problem that are hundreds of
> characters long.
>
> -- Paul
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

These are the regexps that gnome-terminal uses for it's URL
auto-recognition, and I have shamelessly stolen them for use in one of
my own apps:

urlfinders = [
    
re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?/[-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]*[^]'\\.}>\\),\\\"]"),
    
re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?"),
    
re.compile("(~/|/|\\./)([-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]|\\\\
)+"),
    re.compile("'\\<((mailto:)|)[EMAIL PROTECTED]"),
]
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to