On Aug 10, 10:21 am, samwyse <samw...@gmail.com> wrote: > On Aug 9, 9:41 am, Steven D'Aprano <st...@remove-this- > > cybersource.com.au> wrote: > > On Sun, 09 Aug 2009 06:13:38 -0700,samwysewrote: > > > Here's what I have so far: > > > > import urllib > > > > class AppURLopener(urllib.FancyURLopener): > > > version = "App/1.7" > > > referrer = None > > > def __init__(self, *args): > > > urllib.FancyURLopener.__init__(self, *args) > > > if self.referrer: > > > addheader('Referer', self.referrer) > > > > urllib._urlopener = AppURLopener() > > > > Unfortunately, the 'Referer' header potentially varies for each url that > > > I retrieve, and the way the module is written, I can't change the calls > > > to __init__ or open. The best idea I've had is to assign a new value to > > > my class variable just before calling urllib.urlretrieve(), but that > > > just seems ugly. Any ideas? Thanks. > > > [Aside: an int variable is an int. A str variable is a str. A list > > variable is a list. A class variable is a class. You probably mean a > > class attribute, not a variable. If other languages want to call it a > > variable, or a sausage, that's their problem.] > > > If you're prepared for a bit of extra work, you could take over all the > > URL handling instead of relying on automatic openers. This will give you > > much finer control, but it will also require more effort on your part. > > The basic idea is, instead of installing openers, and then ask the urllib > > module to handle the connection, you handle the connection yourself: > > > make a Request object using urllib2.Request > > make an Opener object using urllib2.build_opener > > call opener.open(request) to connect to the server > > deal with the connection (retry, fail or read) > > > Essentially, you use the Request object instead of a URL, and you would > > add the appropriate referer header to the Request object. > > > Another approach, perhaps a more minimal change than the above, would be > > something like this: > > > # untested > > class AppURLopener(urllib.FancyURLopener): > > version = "App/1.7" > > def __init__(self, *args): > > urllib.FancyURLopener.__init__(self, *args) > > def add_referrer(self, url=None): > > if url: > > addheader('Referer', url) > > > urllib._urlopener = AppURLopener() > > urllib._urlopener.add_referrer("http://example.com/") > > Thanks for the ideas. I'd briefly considered something similar to > your first idea, implementing my own version of urlretrieve to accept > a Request object, but it does seem like a good bit of work. Maybe > over Labor Day. :) > > The second idea is pretty much what I'm going to go with for now. The > program that I'm writing is almost a clone of wget, but it fixes some > personal dislikes with the way recursive retrievals are done. (Or > maybe I just don't understand wget's full array of options well > enough.) This means that my referrer changes as I bounce up and down > the hierarchy, which makes this less convenient. Still, it does seem > more convenient that re-writing the module from scratch.
Just wanted to add a note. I used the sample code posted above, and I would get this syntax error: NameError: global name 'addheader' is not defined The fix for the code is to change the line that references addheader to say this: self.addheader('Referer', url) ~E -- http://mail.python.org/mailman/listinfo/python-list