[issue1067702] urllib fails with multiple ftp transfers
Sohaib Ahmad added the comment: The problem is reproducible on latest python 2.7 package (2.7.12). I tried the same scenario on 2.7.10 and it worked fine. I am not sure if this issue can be reopened or should I create a new one? In my case first transfer succeeds but second ftp transfer fails with the error: [Errno ftp error] 200 Type set to I I am using urllib.urlretrieve(url, local_path) to retrieve two files (one by one) from FTP server. -- nosy: +Sohaib Ahmad ___ Python tracker <http://bugs.python.org/issue1067702> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
New submission from Sohaib Ahmad: urllib.urlretrieve() fails on ftp: - start and complete a transfer - immediately start another transfer The second transfer will fail with the following error: [Errno ftp error] 200 Type set to I I am using urllib.urlretrieve(url, filename) to retrieve two files (one by one) from FTP server. Sample code to reproduce the problem is attached. Please update url1 and url2 with correct values. This problem was reported several years ago and was fixed but it is now reproducible on latest python 2.7 package (2.7.12). http://bugs.python.org/issue1067702 I tried the same scenario on 2.7.10 and it worked fine. So a patch after 2.7.10 must have broken something. -- components: Library (Lib) files: multiple_ftp_download.py messages: 274559 nosy: Sohaib Ahmad priority: normal severity: normal status: open title: urllib.urlretrieve() fails on second ftp transfer type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file44396/multiple_ftp_download.py ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: I am not much familiar with mercurial. I will try to setup the development environment. Traceback is: [Errno ftp error] 200 Switching to Binary mode. Traceback (most recent call last): File "multiple_ftp_download.py", line 49, in main file2_path = download_from_url(url2, local_folder=tmpDir) File "multiple_ftp_download.py", line 32, in download_from_url filename = urllib.urlretrieve(url, local_path)[0] File "C:\Python27\lib\urllib.py", line 98, in urlretrieve return opener.retrieve(url, filename, reporthook, data) File "C:\Python27\lib\urllib.py", line 245, in retrieve fp = self.open(url, data) File "C:\Python27\lib\urllib.py", line 213, in open return getattr(self, name)(url) File "C:\Python27\lib\urllib.py", line 558, in open_ftp (fp, retrlen) = self.ftpcache[key].retrfile(file, type) File "C:\Python27\lib\urllib.py", line 906, in retrfile conn, retrlen = self.ftp.ntransfercmd(cmd) File "C:\Python27\lib\ftplib.py", line 334, in ntransfercmd host, port = self.makepasv() File "C:\Python27\lib\ftplib.py", line 312, in makepasv host, port = parse227(self.sendcmd('PASV')) File "C:\Python27\lib\ftplib.py", line 830, in parse227 raise error_reply, resp IOError: [Errno ftp error] 200 Switching to Binary mode. -- ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: Thank you for pointing me towards hg bisect. I got some time to look into it and was able to find the commit that broke this functionality. A fix from Python 3 was backported in issue "urllib hangs when closing connection" which removed a call to ftp.voidresp(). Without this call the second download using urlretrieve() now fails in 2.7.12. Issue ID: http://bugs.python.org/issue26960 Commit ID: https://hg.python.org/cpython/rev/44d02a5d59fb voidresp() itself calls getresp(). So issue26960 could be because control never returns from getresp(). In my opinion this commit (101286) should be reverted and getresp() should be updated with some sort of timeout to fix issue26960. -- ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: I didn't know that urllib.urlopen() retrieves complete object in case of FTP. When getresp() is called for big files (the one in issue26960), RETR command is initiated and server returns code 150 which means "standby for another reply" and there is where the control got stuck and issue26960 was reported. This is the end of debug log with the file mentioned in issue26960, after which the control got stuck: *cmd* 'TYPE I' *put* 'TYPE I\r\n' *get* '200 Type set to I\r\n' *resp* '200 Type set to I' *cmd* 'PASV' *put* 'PASV\r\n' *get* '227 Entering Passive Mode (130,133,3,130,207,26).\r\n' *resp* '227 Entering Passive Mode (130,133,3,130,207,26).' *cmd* 'RETR ratings.list.gz' *put* 'RETR ratings.list.gz\r\n' *get* '150 Opening BINARY mode data connection for ratings.list.gz (12643237 bytes)\r\n' *resp* '150 Opening BINARY mode data connection for ratings.list.gz (12643237 bytes)' And this is the end of debug log of a very small file transfer over FTP: *cmd* 'PASV' *put* 'PASV\r\n' *get* '227 Entering Passive Mode (130,239,18,165,234,243).\r\n' *resp* '227 Entering Passive Mode (130,239,18,165,234,243).' *cmd* 'RETR Contents-udeb-ppc64el.gz' *put* 'RETR Contents-udeb-ppc64el.gz\r\n' *get* '150 Opening BINARY mode data connection for Contents-udeb-ppc64el.gz (26555 bytes).\r\n' *resp* '150 Opening BINARY mode data connection for Contents-udeb-ppc64el.gz (26555 bytes).' *get* '226 Transfer complete.\r\n' *resp* '226 Transfer complete.' The control returned successfully once FTP returned 2xx. Please correct me if I am wrong but from the RETR command it looks like it is trying the retrieve the whole file in both cases. Is urlopen() supposed to retrieve files when called or just get the headers/information etc.? -- ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: I manually reverted the issue26960 patch which fixed my issue of consecutive downloads but it also caused regression of issue26960. I am looking into what could be causing this hang when voidresp() is called using the demo available in issue26960 and it looks when urlopen() is called following happens: urlopen() > URLopener.open() > URLopener.open_ftp > ftpwrapper.retrfile() > FTP.ntransfercmd() Now this retrfile() calls FTP.ntransfercmd() in ftplib which sends RETR command to ftp server which, if I understand correctly, means that retrieve a copy of the file from FTP server. If RETR does retrieve complete file then I think the behavior after reverting issue26960 patch is fine and the hang would be there for large files. I think we can fix this freeze for large files but I have two questions regarding this: 1) Is urlopen() supposed to download complete files? From Python doc, it looks like it only returns a network object or an exception in case of invalid URL. 2) If it is not supposed to download complete files, can we switch to LIST instead of RETR for FTP files? I'd be grateful if a urllib / ftplib expert can answer the above questions. -- ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: The attached patch fixes the problem with multiple ftp downloads while keeping the fix for issue1067702 intact. The fix basically uses a new parameter ftp_retrieve to change the behavior of ftpwrapper.retrfile() if it is being called by urlretrieve(). I am not familiar with the process of contributing a patch in Python repo so please review and commit the attached urllib.patch file. Tested with urlopen (https, http, ftp) and urlretrieve (ftp). -- keywords: +patch Added file: http://bugs.python.org/file44692/urllib.patch ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: Hi Senthil, Thanks for the review. Now that I look at it, even with a default value, an ftp specific parameter sure does break the open() API abstraction. -- ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Changes by Sohaib Ahmad : Removed file: http://bugs.python.org/file44692/urllib.patch ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: I finally found the actual problem causing the failure of second download. urlretrieve() works with FTP in PASV mode, and in PASV mode after sending the file to client, the FTP server sends an ACK that the file has been transferred. After the fix of issue1067702 socket was being closed without receiving this ACK. Now, when a user tries to download the same file or another file from same directory, the key (host, port, dirs) remains the same so open_ftp() skips ftp initialization. Because of this skipping, previous FTP connection is reused and when new commands are sent to the server, server first sends the previous ACK. This causes a domino effect and each response gets delayed by one and we get an exception from parse227(). Expected response: *cmd* 'RETR Contents-udeb-ppc64el.gz' *resp* '150 Opening BINARY mode data connection for Contents-udeb-ppc64el.gz (26555 bytes).' *resp* '226 Transfer complete.' *cmd* 'TYPE I' *resp* '200 Switching to Binary mode.' *cmd* 'PASV' *resp* '227 Entering Passive Mode (130,239,18,173,137,59).' Actual response: *cmd* 'RETR Contents-udeb-ppc64el.gz' *resp* '150 Opening BINARY mode data connection for Contents-udeb-ppc64el.gz (26555 bytes).' *cmd* 'TYPE I' *resp* '226 Transfer complete.' *cmd* 'PASV' *resp* '200 Switching to Binary mode.' I am attaching a new patch (urllib.patch) which fixes this problem by clearing the FTP server responses first if an existing connection is being used to download a file. Please review and let me know if it looks good. -- Added file: http://bugs.python.org/file44712/urllib.patch ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: Can someone please review this patch so that it would be in 2.7.13 when it comes out? -- ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27973] urllib.urlretrieve() fails on second ftp transfer
Sohaib Ahmad added the comment: @Senthil, thanks for looking into this. Looking forward to your commit. Regards. -- ___ Python tracker <http://bugs.python.org/issue27973> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com