Regex Help

Support Desk Mon, 22 Sep 2008 09:39:57 -0700

Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like


http://somesite.co

or http://somesite.ph

the code I am using is 


regex = r'<a href=["|\']([^"|\']+)["|\']>'

page_text = urllib.urlopen('http://somesite.com')
page_text = page_text.read()

links = re.findall(regex, text, re.IGNORECASE)



--
http://mail.python.org/mailman/listinfo/python-list

Regex Help

Reply via email to