Re: BeautifulSoup vs. real-world HTML comments - possible fix

2007-05-13 Thread John Nagle
John Nagle wrote: > Note what happens when a bad declaration is found. > SGMLParser.parse_declaration > raises SGMLParseError, and the exception handler just sucks up the rest > of the > input (note that "rawdata[i:]"), treats it as unparsed data, and advances > the position to the end of input

Re: BeautifulSoup vs. real-world HTML comments - possible fix

2007-05-13 Thread John Nagle
Robert Kern wrote: > Carl Banks wrote: > >>On Apr 4, 4:55 pm, Robert Kern <[EMAIL PROTECTED]> wrote: >> >>>Carl Banks wrote: >>> On Apr 4, 2:43 pm, Robert Kern <[EMAIL PROTECTED]> wrote: >Carl Banks wrote: > >>On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: >> >>>

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Robert Kern
Carl Banks wrote: > On Apr 4, 4:55 pm, Robert Kern <[EMAIL PROTECTED]> wrote: >> Carl Banks wrote: >>> On Apr 4, 2:43 pm, Robert Kern <[EMAIL PROTECTED]> wrote: Carl Banks wrote: > On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: >> BeautifulSoup can't parse this page usefully

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Carl Banks
On Apr 4, 4:55 pm, Robert Kern <[EMAIL PROTECTED]> wrote: > Carl Banks wrote: > > On Apr 4, 2:43 pm, Robert Kern <[EMAIL PROTECTED]> wrote: > >> Carl Banks wrote: > >>> On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: > BeautifulSoup can't parse this page usefully at all. > It tre

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Robert Kern
Carl Banks wrote: > On Apr 4, 2:43 pm, Robert Kern <[EMAIL PROTECTED]> wrote: >> Carl Banks wrote: >>> On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: BeautifulSoup can't parse this page usefully at all. It treats the entire page as a text chunk. It's actually HTMLParser th

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Paul Boddie
John Nagle wrote: > The syntax that browsers understand as HTML comments is much less > restrictive than what BeautifulSoup understands. I keep running into > sites with formally incorrect HTML comments which are parsed happily > by browsers. Here's yet another example, this one from > "http://ww

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Carl Banks
On Apr 4, 2:43 pm, Robert Kern <[EMAIL PROTECTED]> wrote: > Carl Banks wrote: > > On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: > >> BeautifulSoup can't parse this page usefully at all. > >> It treats the entire page as a text chunk. It's actually > >> HTMLParser that parses comments, s

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Steve Holden
Carl Banks wrote: > On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: >> The syntax that browsers understand as HTML comments is much less >> restrictive than what BeautifulSoup understands. I keep running into >> sites with formally incorrect HTML comments which are parsed happily >> b

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread irstas
Carl Banks wrote: > On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: > > The syntax that browsers understand as HTML comments is much less > > restrictive than what BeautifulSoup understands. I keep running into > > sites with formally incorrect HTML comments which are parsed happily

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Robert Kern
Carl Banks wrote: > On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: >> BeautifulSoup can't parse this page usefully at all. >> It treats the entire page as a text chunk. It's actually >> HTMLParser that parses comments, so this is really an HTMLParser >> level problem. > > Google for a

Re: BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread Carl Banks
On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote: > The syntax that browsers understand as HTML comments is much less > restrictive than what BeautifulSoup understands. I keep running into > sites with formally incorrect HTML comments which are parsed happily > by browsers. Here's yet

BeautifulSoup vs. real-world HTML comments

2007-04-04 Thread John Nagle
The syntax that browsers understand as HTML comments is much less restrictive than what BeautifulSoup understands. I keep running into sites with formally incorrect HTML comments which are parsed happily by browsers. Here's yet another example, this one from "http://www.webdirectory.com";. T