none wrote: > Gabriel Genellina wrote: > >> En Mon, 18 Jun 2007 16:38:18 -0300, Sergio Monteiro Basto >> <[EMAIL PROTECTED]> escribió: >> >>> Can someone explain me, what is wrong with this site ? >>> >>> python linkExtractor3.py http://www.noticiasdeaveiro.pt > test
> ok but my problem is not understand what is the specific problem at line > 1173 > >> HTMLParser expects valid HTML - try a different tool, like >> BeautifulSoup, which is specially designed to handle malformed pages. >> >> --Gabriel Genellina Yes, you almost have to use BeautfulSoup on real-world web pages. Even that may not be enough; I have my own even more robust version of BeautifulSoup. (I've sent the fixes, which are small, to the author.) The usual BeautifulSoup killer is improperly terminated HTML comments. The default action is to suck up the rest of the entire document into the comment, which is usually not what you want. I have a fix for that at http://mail.python.org/pipermail/python-list/2007-May/440370.html John Nagle -- http://mail.python.org/mailman/listinfo/python-list