Felipe De Bene wrote:
> I'm having problems parsing an HTML file with the following syntax :
> 
> <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'>
>     <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH>
>     <TH Width='10%' BGCOLOR='#c0c0c0'>Name</TH><TH width='7%'
> BGCOLOR='#c0c0c0'>Date</TH>
> and so on....
> 
> whenever I feed the parser with such file I get the error :
> 
> HTMLParser.HTMLParseError: bad end tag: "</TH BGCOLOR='#c0c0c0'>", at
> line 515, column 45

Your HTML page is not HTML, i.e. it is broken. Python's HTMLParser is not made
for parsing broken HTML. However, you can use the parse of lxml.html to fix up
your HTML for you.

http://codespeak.net/lxml/

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to