On Dec 30, 7:08 pm, MRAB <pyt...@mrabarnett.plus.com> wrote: > Brian D wrote: > > Thanks MRAB as well. I've printed all of the replies to retain with my > > pile of essential documentation. > > > To follow up with a complete response, I'm ripping out of my mechanize > > module the essential components of the solution I got to work. > > > The main body of the code passes a URL to the scrape_records function. > > The function attempts to open the URL five times. > > > If the URL is opened, a values dictionary is populated and returned to > > the calling statement. If the URL cannot be opened, a fatal error is > > printed and the module terminates. There's a little sleep call in the > > function to leave time for any errant connection problem to resolve > > itself. > > > Thanks to all for your replies. I hope this helps someone else: > > > import urllib2, time > > from mechanize import Browser > > > def scrape_records(url): > > maxattempts = 5 > > br = Browser() > > user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: > > 1.9.0.16) Gecko/2009120208 Firefox/3.0.16 (.NET CLR 3.5.30729)' > > br.addheaders = [('User-agent', user_agent)] > > for count in xrange(maxattempts): > > try: > > print url, count > > br.open(url) > > break > > except urllib2.URLError: > > print 'URL error', count > > # Pretend a failed connection was fixed > > if count == 2: > > url = 'http://www.google.com' > > time.sleep(1) > > pass > > 'pass' isn't necessary. > > > else: > > print 'Fatal URL error. Process terminated.' > > return None > > # Scrape page and populate valuesDict > > valuesDict = {} > > return valuesDict > > > url = 'http://badurl' > > valuesDict = scrape_records(url) > > if valuesDict == None: > > When checking whether or not something is a singleton, such as None, use > "is" or "is not" instead of "==" or "!=". > > > print 'Failed to retrieve valuesDict' > >
I'm definitely acquiring some well-deserved schooling -- and it's really appreciated. I'd seen the "is/is not" preference before, but it just didn't stick. I see now that "pass" is redundant -- thanks for catching that. Cheers. -- http://mail.python.org/mailman/listinfo/python-list