url = re.compile(r"^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z] {1} ([\w\-]+\.)+ ([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?((\?\w+=\w+)? (& \w+=\w+)*)?")
why isnt this url catching something like: <link rel="alternate" type="application/rss+xml" title="Python Screencasts" href="http://www.showmedo.com/latestVideoFeed/rss2.0? tag=python" /> site = urllib.urlopen("http://www.python.org") for row in site: obj = url.search(row) if obj != None: print "url: ", obj.group() i know it works because it can catch www.hello.com in a txt-file and i can catch emails of websites with another regexp. search and match yields the same results. but when you put something like href= in front of it it doesnt work. i see now that it has to match the beginning of the row or something, because: hi www.google.com doesnt match but www.google.com hi matches. i though a regexp would search a row/file and when it finds an occurence report it, so a regexp of "lo" would match in lopez. -- http://mail.python.org/mailman/listinfo/python-list