is this where i've seen references to integrating Beautifulsoup in the wb
browsing app?

-bruce


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf
Of John J Lee
Sent: Monday, July 10, 2006 2:29 AM
To: [EMAIL PROTECTED]
Cc: python-list@python.org
Subject: RE: [wwwsearch-general] ClientForm request re ParseErrors


On Sun, 9 Jul 2006, bruce wrote:
[...]
> sgmllib.SGMLParseError: expected name token at '<! Others/0/WIN; Too'
>
>
> partial html
> -----------------------------------
> </table>
> <br />
> <FORM NAME='main' METHOD=POST
>
Action="/servlets/iclientservlet/a2k_prd/?ICType=Panel&Menu=SA_LEARNER_SERVI
> CES&Market=GBL&PanelGroupName=CLASS_SEARCH"  autocomplete=off>
> <INPUT TYPE=hidden NAME=ICType VALUE=Panel>
> <INPUT TYPE=hidden NAME=ICElementNum VALUE="0">
> <INPUT TYPE=hidden NAME=ICStateNum VALUE="1">
[...]

You don't include the HTML mentioned in the exception message ('<!
Others/0/WIN; Too') in the part of the HTML that you quote, but that
snippet is enough to see what's wrong, and lets you find exactly where in
the HTML the problem lies.  Comments in HTML start with '<!--' and end
with '-->'.  The comment sgmllib is complaining about is missing the '--'.

You can work around bad HTML using the .set_data() method on response
objects and the .set_response() method on Browser.  Call the latter before
you call any other methods that would require parsing the HTML.

r = br.response()
r.set_data(clean_html(br.get_data()))
br.set_response(r)


You must write clean_html yourself (though you may use an external tool to
do so, of course).

Alternatively, use a more robust parser, e.g.

br = mechanize.Browser(factory=mechanize.RobustFactory())


(you may also integrate another parser of your choice with mechanize, with
more effort)


John
--
http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to