Hi, I am trying to construct a regular expression using the re module that matches for 1. my hostname 2. absolute from the root URLs including just "/" 3. relative URLs.
Basically I want the attern to not match for URLs that are not on my host. The following statement satisfies numbers 1 and 2, but not 3: line = re.sub(r'(href=")(http[s]?://'+hostname+'[/]?|/)([^"]*?)(")',r'\1\2\3'+sInfo+r'\4',line) An improvement that also partially satisfies number 3 is line = re.sub(r'(href=")(http[s]?://'+hostname+'[/]?|/|[^h][^t][^t][^p][^:][^/][^/])([^"]*?)(")',r'\1\2\3'+sInfo+r'\4',line) This is not complete because if the relative url is less than seven characters, than it will not match. Any suggestions? Thanx. -- http://mail.python.org/mailman/listinfo/python-list