Felipe De Bene wrote: > I'm having problems parsing an HTML file with the following syntax : > > <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> > <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> > <TH Width='10%' BGCOLOR='#c0c0c0'>Name</TH><TH width='7%' > BGCOLOR='#c0c0c0'>Date</TH> > and so on.... > > whenever I feed the parser with such file I get the error : > > HTMLParser.HTMLParseError: bad end tag: "</TH BGCOLOR='#c0c0c0'>", at > line 515, column 45
Your HTML page is not HTML, i.e. it is broken. Python's HTMLParser is not made for parsing broken HTML. However, you can use the parse of lxml.html to fix up your HTML for you. http://codespeak.net/lxml/ Stefan -- http://mail.python.org/mailman/listinfo/python-list