On 8 янв, 08:44, Water Lin <water...@ymail.invalid> wrote: > I am a new guy to use Python, but I want to parse a html page now. I > tried to use HTMLParse. Here is my sample code: > ---------------------- > from HTMLParser import HTMLParser > from urllib2 import urlopen > > class MyParser(HTMLParser): > title = "" > is_title = "" > def __init__(self, url): > HTMLParser.__init__(self) > req = urlopen(url) > self.feed(req.read()) > > def handle_starttag(self, tag, attrs): > if tag == 'div' and attrs[0][1] == 'articleTitle': > print "Found link => %s" % attrs[0][1] > self.is_title = 1 > > def handle_data(self, data): > if self.is_title: > print "here" > self.title = data > print self.title > self.is_title = 0 > ----------------------- > > For the tag > ------- > <div class="articleTitle">open article title</div> > ------- > > I use my code to parse it. I can locate the div tag but I don't know how > to get the text for the tag which is "open article title" in my example. > > How can I get the html content? What's wrong in my handle_data function? > > Thanks > > Water Lin > > -- > Water Lin's notes and pencils:http://en.waterlin.org > Email: water...@ymail.com
I want to say your code works well -- http://mail.python.org/mailman/listinfo/python-list