Hi john. Thanks for your reply. I tried your suggestion of using RobustFactory, and still get a badly maligned html back!!! The html is listed below. I would have thought that the mech process, would have interpreted the "http-equiv="refresh" Unfortunately, mechanize apparently isn't able to handle a "<meta http-equiv="refresh" url="/foo/..."> when it's inside the <body> of the html...
test.html ------------------------------------------------------------------ <html> <head> <TITLE></TITLE> </head> <BODY BGCOLOR="#FFFFFF"> <TD NOWRAP WIDTH="45" VALIGN="top"><A HREF="javascript:openAWindow('http://www.registrar.psu.edu/faculty_staff/enr oll_services/clsrooms.html#C','Intent',625,425,1)"><FONT FACE="Arial, Helvetica, sans-serif" SIZE="2"><strong>Tech Type</strong></FONT></A></TD> <META HTTP-EQUIV="Refresh" CONTENT="0;url=/soc/fall/Alloz/a-c/acctg.html#"> --------------------------------------------------------------------------- as you can see, there is no closing </body></html> tag.... thanks stripped down, test code... ---------------------------------------- from mechanize import Browser import mechanize br = Browser() br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) br.set_handle_refresh(True) br.addheaders = [('User-Agent', 'Firefox')] url="http://schedule.psu.edu/act_main_search.cfm?Semester=FALL%202008%20%20% 20%20&CrseLoc=OZ%3A%3AAbington%20Campus&CECrseLoc=AllOZ%3A%3AAbington%20Camp us&CourseAbbrev=ACCTG&CourseNum=&CrseAlpha=" br.open(url) res = br.response() # this is a copy of response s = res.read() print "slen=",len(s) print s sys.exit() ---------------------------------- -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of John J Lee Sent: Friday, August 29, 2008 12:34 PM To: [EMAIL PROTECTED] Cc: python-list@python.org Subject: Re: [wwwsearch-general] (no subject) On Fri, 29 Aug 2008, bruce wrote: [...] > does the page (test.html) need to be completely valid html? No, but there are certainly (poorly-defined) limitations. I haven't tried to understand your script or the HTML, but did you try this: br = mechanize.Browser(mechanize.RobustFactory()) ... John -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list