Vincent Davis wrote:
I think there are two parts to this question and I am sure lots I am
missing. I am hoping an example will help me
I have a html doc that I am trying to use regular expressions to get a
value out of.
here is an example or the line
<td colspan='2'>Parcel ID: 39-034-15-009 </td>
I want to get the number "39-034-15-009" after "Parcel ID:" The number
will be different each time but always the same format.
I think I can match "Parcel ID:" but not sure how to get the number
after. "Parcel ID:" only occurs once in the document.
is this how i need to start?
pid = re.compile('Parcel ID: ')
Basically I am completely lost and am not finding examples I find helpful.
I am getting the html using myurl=urllib.urlopen().
Can I use RE like this
thenum=pid.match(myurl)
I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://.......). can I use the myurl
as the string in a RE, thenum=pid.match(myurl)
Something like:
pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
myurl = urllib.urlopen(url)
text = myurl.read()
myurl.close()
thenum = pid.search(text).group(1)
Although BeautifulSoup is the preferred solution.
--
http://mail.python.org/mailman/listinfo/python-list