Bugs item #1016880, was opened at 2004-08-26 15:58 Message generated for change (Comment added) made by birkenfeld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1016880&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None >Status: Closed >Resolution: Fixed Priority: 6 Submitted By: David Abrahams (david_abrahams) >Assigned to: Reinhold Birkenfeld (birkenfeld) Summary: urllib.urlretrieve silently truncates downloads Initial Comment: The following script appears to be unreliable on all versions of Python we can find. The file being downloaded is approximately 34 MB. Browsers such as IE and Mozilla have no problem downloading the whole thing. ---- import urllib import os os.chdir('/tmp') urllib.urlretrieve ('http://cvs.sourceforge.net/cvstarballs/boost- cvsroot.tar.bz2', 'boost-cvsroot.tar.bz2') ---------------------------------------------------------------------- >Comment By: Reinhold Birkenfeld (birkenfeld) Date: 2005-08-24 20:49 Message: Logged In: YES user_id=1188172 Fixed wrt patch #1062060. ---------------------------------------------------------------------- Comment By: Irmen de Jong (irmen) Date: 2004-12-24 15:30 Message: Logged In: YES user_id=129426 Suggested addition to the doc of urllib (liburllib.tex, if I'm not mistaken): """ urlretrieve will raise IOError when it detects that the amount of data available was less than the expected amount (which is the size reported by a Content-Length header). This can occur, for example, when the download is interrupted. The Content-Length is treated as a lower bound (just like tools such as wget and Ffirefox appear to do): if there's more data to read, urlretrieve reads more data, but if less data is available, it raises IOError. If no Content-Length header was supplied, urlretrieve can not check the size of the data it has downloaded, and just returns it. In this case you just have to assume that the download was successful. """ ---------------------------------------------------------------------- Comment By: Irmen de Jong (irmen) Date: 2004-11-07 21:17 Message: Logged In: YES user_id=129426 a patch is at 1062060 (raises IOError when download is incomplete) ---------------------------------------------------------------------- Comment By: Irmen de Jong (irmen) Date: 2004-11-07 20:47 Message: Logged In: YES user_id=129426 Confirmed here (mandrakelinux 10.0, python 2.4b2) However, I doubt it is a problem in urllib.urlretrieve, because I tried downloading the file with wget, and got the following: [EMAIL PROTECTED] tmp]$ wget -S http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2 --20:38:11-- http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2 => `boost-cvsroot.tar.bz2.1' Resolving cvs.sourceforge.net... 66.35.250.207 Connecting to cvs.sourceforge.net[66.35.250.207]:80... connected. HTTP request sent, awaiting response... 1 HTTP/1.1 200 OK 2 Date: Sun, 07 Nov 2004 19:38:15 GMT 3 Server: Apache/2.0.40 (Red Hat Linux) 4 Last-Modified: Sat, 06 Nov 2004 15:11:39 GMT 5 ETag: "b63d5b-25c3808-687d80c0" 6 Accept-Ranges: bytes 7 Content-Length: 39598088 8 Content-Type: application/x-bzip2 9 Connection: close 31% [=======================> ] 12,665,616 60.78K/s ETA 03:55 20:40:07 (111.60 KB/s) - Connection closed at byte 12665616. Retrying. --20:40:08-- http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2 (try: 2) => `boost-cvsroot.tar.bz2.1' Connecting to cvs.sourceforge.net[66.35.250.207]:80... connected. HTTP request sent, awaiting response... ....... so the remote server just closed the connection halfway trough! I suspect that a succesful download is sheer luck. Also, the download loop in urllib looks fine to me. It only stops when the read() returns an empty result, and that means EOF. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2004-08-26 22:04 Message: Logged In: YES user_id=80475 Followed the same procedure (no chdir, add a hook) but bombed out at 9.1Mb: . . . (1117, 8192, 34520156) ('boost-cvsroot.tar.bz2', <httplib.HTTPMessage instance at 0x00B1E4B8>) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2004-08-26 20:52 Message: Logged In: YES user_id=31435 Hmm. I don't know anything about this, but thought I'd just try it. Didn't chdir(), did add a reporthook: def hook(*args): print args WinXP Pro SP1, current CVS Python, cable modem over a wireless router. Output looked like this: (0, 8192, 34520156) (1, 8192, 34520156) (2, 8192, 34520156) ... (4213, 8192, 34520156) (4214, 8192, 34520156) (4215, 8192, 34520156) Had the whole file when it ended: > wc boost-cvsroot.tar.bz2 125368 765656 34520156 boost-cvsroot.tar.bz2 *Maybe* adding the reporthook changed timing in some crucial way. Don't know. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2004-08-26 19:09 Message: Logged In: YES user_id=80475 Confirmed. On Py2.4 (current CVS), I got 12.7 Mb before the connection closed. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1016880&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com