Gabriel Genellina wrote: > On 21 ago, 18:36, [EMAIL PROTECTED] (John J. Lee) wrote: >> Gabriel Genellina <[EMAIL PROTECTED]> writes: >> >> [...]> Don't even try to understand it - it's a mess. Use the HTMLParser >>> module instead. >> [...] >> >> Module sgmllib (and therefore module htmllib also) is more tolerant of >> bad HTML than module HTMLParser. > > I had the impression it was the opposite; anyway, neither of them can > handle really bad html. > I just don't *like* htmllib.HTMLParser - but that's only a matter of > taste.
lxml.html handles bad HTML and it's a powerful tool that is very easy to use. And if one day you have to deal with really, *really* broken tag soup, it also comes with BeautifulSoup parser integration. Stefan -- http://mail.python.org/mailman/listinfo/python-list