hi john... this is in regards to the web/parsing/factory/beautifulsoup....
to reiterate, i have python 2.4, mechanize, browser, beatifulsoup installed. i have the latest mech from svn. i'm getting the same err as reported by john t. the code/err follows.. (i can resend the test html if you need) any thoughts/pointers/etc would be helpful... thanks -bruce test code #! /usr/bin/env python #test python script import re import libxml2dom import urllib import urllib2 import sys, string #import numarray import httplib from mechanize import Browser, RobustFactory import mechanize import BeautifulSoup ######################## # # Parsing App Information ######################## # datafile tfile = open("stanford.dat", 'wr+') cj = mechanize.CookieJar() br = Browser() if __name__ == "__main__": # main app #---------------------------- # start trying to get the stanford pages cj = mechanize.CookieJar() br = Browser(factory=RobustFactory()) fh = open('axess.dat') s = fh.read() fh.close() br.open("file:///home/test/axess.dat") . . . . err/output Traceback (most recent call last): File "./axess.py", line 45, in ? br.open("file:///home/test/axess.dat") File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 130, in open File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 170, in _mech_open File "build/bdist.linux-i686/egg/mechanize/_mechanize.py", line 213, in set_response File "build/bdist.linux-i686/egg/mechanize/_html.py", line 577, in set_response File "build/bdist.linux-i686/egg/mechanize/_html.py", line 316, in __init__ File "/usr/lib/python2.4/site-packages/BeautifulSoup.py", line 1326, in __init__ BeautifulStoneSoup.__init__(self, *args, **kwargs) File "/usr/lib/python2.4/site-packages/BeautifulSoup.py", line 973, in __init__ self._feed() File "/usr/lib/python2.4/site-packages/BeautifulSoup.py", line 987, in _feed smartQuotesTo=self.smartQuotesTo) File "/usr/lib/python2.4/site-packages/BeautifulSoup.py", line 1580, in __init__ u = self._convertFrom(proposedEncoding) File "/usr/lib/python2.4/site-packages/BeautifulSoup.py", line 1614, in _convertFrom proposed = self.find_codec(proposed) File "/usr/lib/python2.4/site-packages/BeautifulSoup.py", line 1731, in find_codec return self._codec(self.CHARSET_ALIASES.get(charset, charset)) \ File "/usr/lib/python2.4/site-packages/BeautifulSoup.py", line 1740, in _codec codecs.lookup(charset) TypeError: lookup() argument 1 must be string, not bool is this where i've seen references to integrating Beautifulsoup in the wb browsing app? -bruce -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of John J Lee Sent: Monday, July 10, 2006 2:29 AM To: [EMAIL PROTECTED] Cc: python-list@python.org Subject: RE: [wwwsearch-general] ClientForm request re ParseErrors On Sun, 9 Jul 2006, bruce wrote: [...] > sgmllib.SGMLParseError: expected name token at '<! Others/0/WIN; Too' > > > partial html > ----------------------------------- > </table> > <br /> > <FORM NAME='main' METHOD=POST > Action="/servlets/iclientservlet/a2k_prd/?ICType=Panel&Menu=SA_LEARNER_SERVI > CES&Market=GBL&PanelGroupName=CLASS_SEARCH" autocomplete=off> > <INPUT TYPE=hidden NAME=ICType VALUE=Panel> > <INPUT TYPE=hidden NAME=ICElementNum VALUE="0"> > <INPUT TYPE=hidden NAME=ICStateNum VALUE="1"> [...] You don't include the HTML mentioned in the exception message ('<! Others/0/WIN; Too') in the part of the HTML that you quote, but that snippet is enough to see what's wrong, and lets you find exactly where in the HTML the problem lies. Comments in HTML start with '<!--' and end with '-->'. The comment sgmllib is complaining about is missing the '--'. You can work around bad HTML using the .set_data() method on response objects and the .set_response() method on Browser. Call the latter before you call any other methods that would require parsing the HTML. r = br.response() r.set_data(clean_html(br.get_data())) br.set_response(r) You must write clean_html yourself (though you may use an external tool to do so, of course). Alternatively, use a more robust parser, e.g. br = mechanize.Browser(factory=mechanize.RobustFactory()) (you may also integrate another parser of your choice with mechanize, with more effort) John -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list