josh logan, 25.10.2010 04:14:
I found the error. The HTML file I'm parsing has invalid HTML at line
193. It has something like:

<a href="mystuff "class = "stuff">

Note there is no space between the closing quote for the "href" tag
and the class attribute. I guess I'll go through each file and correct
these issues as I parse them.

HTMLparser is not made to deal with non-HTML input. You can take a look at lxml.html or BeautifulSoup (up to 3.0), which handle these problems a lot better.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to