Brian D wrote:
Thanks MRAB as well. I've printed all of the replies to retain with my
pile of essential documentation.
To follow up with a complete response, I'm ripping out of my mechanize
module the essential components of the solution I got to work.
The main body of the code passes a URL to the scrape_records function.
The function attempts to open the URL five times.
If the URL is opened, a values dictionary is populated and returned to
the calling statement. If the URL cannot be opened, a fatal error is
printed and the module terminates. There's a little sleep call in the
function to leave time for any errant connection problem to resolve
itself.
Thanks to all for your replies. I hope this helps someone else:
import urllib2, time
from mechanize import Browser
def scrape_records(url):
maxattempts = 5
br = Browser()
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:
1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)'
br.addheaders = [('User-agent', user_agent)]
for count in xrange(maxattempts):
try:
print url, count
br.open(url)
break
except urllib2.URLError:
print 'URL error', count
# Pretend a failed connection was fixed
if count == 2:
url = 'http://www.google.com'
time.sleep(1)
pass
'pass' isn't necessary.
else:
print 'Fatal URL error. Process terminated.'
return None
# Scrape page and populate valuesDict
valuesDict = {}
return valuesDict
url = 'http://badurl'
valuesDict = scrape_records(url)
if valuesDict == None:
When checking whether or not something is a singleton, such as None, use
"is" or "is not" instead of "==" or "!=".
print 'Failed to retrieve valuesDict'
--
http://mail.python.org/mailman/listinfo/python-list