Anybody know of a good regex to parse html links from html code? The one I am currently using seems to be cutting off the last letter of some links, and returning links like
http://somesite.co or http://somesite.ph the code I am using is regex = r'<a href=["|\']([^"|\']+)["|\']>' page_text = urllib.urlopen('http://somesite.com') page_text = page_text.read() links = re.findall(regex, text, re.IGNORECASE) -- http://mail.python.org/mailman/listinfo/python-list