Re: HTMLParser.HTMLParseError: EOF in middle of construct

John Nagle Tue, 19 Jun 2007 21:26:58 -0700

none wrote:
> Gabriel Genellina wrote:
> 
>> En Mon, 18 Jun 2007 16:38:18 -0300, Sergio Monteiro Basto 
>> <[EMAIL PROTECTED]> escribió:
>>
>>> Can someone explain me, what is wrong with this site ?
>>>
>>> python linkExtractor3.py http://www.noticiasdeaveiro.pt > test


> ok but my problem is not understand what is the specific problem at line 
> 1173
> 
>> HTMLParser expects valid HTML - try a different tool, like 
>> BeautifulSoup, which is specially designed to handle malformed pages.
>>
>> --Gabriel Genellina

    Yes, you almost have to use BeautfulSoup on real-world web pages.
Even that may not be enough; I have my own even more robust version of
BeautifulSoup.  (I've sent the fixes, which are small, to the author.)

    The usual BeautifulSoup killer is improperly terminated HTML comments. The
default action is to suck up the rest of the entire document into
the comment, which is usually not what you want.  I have a fix for that
at

http://mail.python.org/pipermail/python-list/2007-May/440370.html

                                John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLParser.HTMLParseError: EOF in middle of construct

Reply via email to