Re: BeautifulSoup vs. real-world HTML comments

Carl Banks Wed, 04 Apr 2007 15:06:05 -0700

On Apr 4, 4:55 pm, Robert Kern <[EMAIL PROTECTED]> wrote:
> Carl Banks wrote:
> > On Apr 4, 2:43 pm, Robert Kern <[EMAIL PROTECTED]> wrote:
> >> Carl Banks wrote:
> >>> On Apr 4, 2:08 pm, John Nagle <[EMAIL PROTECTED]> wrote:
> >>>> BeautifulSoup can't parse this page usefully at all.
> >>>> It treats the entire page as a text chunk.  It's actually
> >>>> HTMLParser that parses comments, so this is really an HTMLParser
> >>>> level problem.
> >>> Google for a program called "tidy".  Install it, and run it as a
> >>> filter on any HTML you download.  "tidy" has invested in it quite a
> >>> bit of work understanding common bad HTML and how browsers deal with
> >>> it.  It would be pointless to duplicate that work in the Python
> >>> standard library; let HTMLParser be small and tight, and outsource the
> >>> handling of floozy input to a dedicated program.
> >> Well, BeautifulSoup is just such a dedicated library.
>
> > No, not really.
>
> Yes, it is. Whether it succeeds in all particulars is besides the point. The
> only mission of BeautifulSoup is to handle bad HTML.


I think the authors of BeautifulSoup have the right to decide what
their own mission is.


Carl Banks

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: BeautifulSoup vs. real-world HTML comments

Reply via email to