On 9/2/2013 1:43 PM, John Nagle wrote:
     I'm reading files from an FTP server at the U.S. Securities and
Exchange Commission.  This code has been running successfully for
years.  Recently, they imposed a consistent connection delay
of 20 seconds at FTP connection, presumably because they're having
some denial of service attack.  Python 2.7 urllib2 doesn't
seem to use the timeout specified.  After 20 seconds, it
gives up and times out.

Here's the traceback:

Internal error in EDGAR update: <urlopen error ftp error: [Errno 110]
Connection timed out>
....
   File "./edgar/edgarnetutil.py", line 53, in urlopen

   File "/opt/python27/lib/python2.7/socket.py", line 571, in
create_connection
...
     raise err
URLError: <urlopen error ftp error: [Errno 110] Connection timed out>

Periodic update completed in 21.1 seconds.
----------------------------------------------

Here's the relevant code:

TIMEOUTSECS = 60        ## give up waiting for server after 60 seconds
...
def urlopen(url,timeout=TIMEOUTSECS) :
     if url.endswith(".gz") : # gzipped file, must decompress first
         nd = urllib2.urlopen(url,timeout=timeout)      # get connection
        ... # (NOT .gz FILE, DOESN'T TAKE THIS PATH)
     else :
        return(urllib2.urlopen(url,timeout=timeout)) # (OPEN FAILS)

I looked at the 3.3 urllib.retrieve.urlopen code and timeout is passed through a couple of layers but is it hard to see if it reaches the socket connection call. I would also try python3.3 as timeout may have been changed a bit.

There are some 'timeout' issues on the tracker, such as
http://bugs.python.org/issue4079
http://bugs.python.org/issue18417
but these do not obviously apply to an explicitly passed timeout

I would also try using ftplib, which cuts out lots of the general purpose layers urlopen. FTP.__init__ stores timeout in self.timeout and calls connect(), which passes self.timeout to socket.create_connection.

>>> import ftplib
>>> ftp = ftplib.FTP("ftp.sec.gov")
>>> ftp.login()
'230-Anonymous access granted, restrictions apply\n \n Please read the file README.txt\n230 it was last modified on Tue Aug 15 14:29:31 2000 - 4765 days ago'
>>> ftp.sendcmd('help')
"214-The following commands are recognized (* =>'s unimplemented):\n CWD XCWD CDUP XCUP SMNT* QUIT PORT PASV \n EPRT EPSV ALLO* RNFR RNTO DELE MDTM RMD \n XRMD MKD XMKD PWD XPWD SIZE SYST HELP \n NOOP FEAT OPTS AUTH* CCC* CONF* ENC* MIC* \n PBSZ* PROT* TYPE STRU MODE RETR STOR STOU \n APPE REST ABOR USER PASS ACCT* REIN* LIST \n NLST STAT SITE MLSD MLST \n214 Direct comments to r...@clone11.sec.gov"

I tried to read 'README.txt but I do not know how to use the commands or local FTP methods.

TIMEOUTSECS used to be 20 seconds, and I increased it to 60. It didn't
help.

This isn't an OS problem. The above traceback was on a Linux system.
On Windows 7, it fails with

"URLError: <urlopen error ftp error: [Errno 10060] A connection attempt
failed because the connected party did not properly respond after a
period of time, or established connection failed because connected host
has failed to respond>"

But in both cases, the command line FTP client will work, after a
consistent 20 second delay before the login prompt.  So the
Python timeout parameter isn't working.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to