josh logan, 25.10.2010 04:14:
I found the error. The HTML file I'm parsing has invalid HTML at line
193. It has something like:
<a href="mystuff "class = "stuff">
Note there is no space between the closing quote for the "href" tag
and the class attribute. I guess I'll go through each file and correct
these issues as I parse them.
HTMLparser is not made to deal with non-HTML input. You can take a look at
lxml.html or BeautifulSoup (up to 3.0), which handle these problems a lot
better.
Stefan
--
http://mail.python.org/mailman/listinfo/python-list