is BeautifulSoup really better? Since I don't know either I would prefer to learn only one for now. Thanks Vincent Davis
On Tue, Jan 27, 2009 at 10:39 AM, MRAB <goo...@mrabarnett.plus.com> wrote: > Vincent Davis wrote: > >> I think there are two parts to this question and I am sure lots I am >> missing. I am hoping an example will help me >> I have a html doc that I am trying to use regular expressions to get a >> value out of. >> here is an example or the line >> <td colspan='2'>Parcel ID: 39-034-15-009 </td> >> I want to get the number "39-034-15-009" after "Parcel ID:" The number >> will be different each time but always the same format. >> I think I can match "Parcel ID:" but not sure how to get the number after. >> "Parcel ID:" only occurs once in the document. >> >> is this how i need to start? >> pid = re.compile('Parcel ID: ') >> >> Basically I am completely lost and am not finding examples I find helpful. >> >> I am getting the html using myurl=urllib.urlopen(). Can I use RE like this >> thenum=pid.match(myurl) >> >> I think the two key things I need to know are >> 1, how do I get the text after a match? >> 2, when I use myurl=urllib.urlopen(http://.......). can I use the myurl >> as the string in a RE, thenum=pid.match(myurl) >> >> Something like: > > pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)') > myurl = urllib.urlopen(url) > text = myurl.read() > myurl.close() > thenum = pid.search(text).group(1) > > Although BeautifulSoup is the preferred solution. > -- > http://mail.python.org/mailman/listinfo/python-list >
-- http://mail.python.org/mailman/listinfo/python-list