I'm trying to understand why HMTLParser.feed() isn't returning the whole page. My test script is this:
import urllib.request import html.parser class MyHTMLParser(html.parser.HTMLParser): def handle_starttag(self, tag, attrs): if tag == 'a' and attrs: print(tag,'-',attrs) url = 'http://x264.nl/x264/?dir=./64bit/8bit_depth' page = urllib.request.urlopen(url).read() parser = MyHTMLParser() parser.feed(str(page)) I can do print(page) and get the entire HTML source, but parser.feed(str(page)) only spits out the information for the top links and none of the "revisionxxxx" links. Ultimately, I just want to find the name of the first "revisionxxxx" link (right now it's "revision1995", when a new build is uploaded it will be "revision2000" or whatever). I figure this is a relatively simple page; once I understand all of this, I can move on to more complicated pages. I've searched Google, but everything I find is either outdated, a recommendation for some external module (I don't need to do anything too fancy and most modules don't completely support Python 3 anyway) or is just a code snippet with no real explanation. I had a book that explained this, but I had to return it to the library (and I'll have to get back in line to check it out again). -- http://mail.python.org/mailman/listinfo/python-list