Hello list, I want to limit the download speed when using urllib2. In particular, having several parallel downloads, I want to make sure that their total speed doesn't exceed a maximum value.
I can't find a simple way to achieve this. After researching a can try some things but I'm stuck on the details: 1) Can I overload some method in _socket.py to achieve this, and perhaps make this generic enough to work even with other libraries than urllib2? 2) There is the urllib.urlretrieve() function which accepts a reporthook parameter. Perhaps I can have reporthook to increment a global counter and sleep as necessary when a threshold is reached. However there is not something similar in urllib2. Isn't urllib2 supposed to be a superset of urllib in functionality? Why there is no reporthook parameter in any of urllib2's functions? Moreover, even the existing way reporthook can be used doesn't seem so right: reporthook(blocknum, bs, size) is always called with bs=8K even for the last block, and sometimes (blocknum*bs > size) is possible, if the server sends wrong Content-Lentgth HTTP headers. 3) Perhaps I can use filehandle.read(1024) and manually read as many chunks of data as I need. However I think this would generally be inefficient and I'm not sure how it would work because of internal buffering of urllib2. So how do you think I can achieve rate limiting in urllib2? Thanks in advance, Dimitris P.S. And something simpler: How can I disallow urllib2 to follow redirections to foreign hosts? -- http://mail.python.org/mailman/listinfo/python-list