On 10/22/2010 10:03 PM, Sean DiZazzo wrote:
Hi,

I have some scripts that send files via ftplib to a client's ftp
site.  The scripts have generally worked great for a few years.
Recently, the client complained that only part of an important  file
made it to their server.  My boss got this complaint and brought it to
my attention.

The first thing I did was track down the specific file transfer in my
logs.  My log showed a success, I told my boss that, but he wasn't
satisfied with my response.  He began asking if there is a record of
the file transfer ack and number of bytes sent for this transfer.  I'm
not keeping a record of that...only success or failure (and some
output)

How can I assure him (and the client) that the transfer completed
successfully like my log shows?  I'm using code similar to the
following:

try:
     ftp = ftplib.FTP(host)
     ftp.login(user, pass)
     ftp.storbinary("STOR " + destfile, open(f.path, 'rb'))
     # log this as success
except:
     # log this as an error

Is ftplib reliable enough to say that if an exception is not thrown,
that the file was transferred in full?

   No.

   This was for years an outstanding problem with FTP under Windows.
See "http://www.fourmilab.ch/documents/corrupted_downloads/";
And "http://us.generation-nt.com/answer/incomplete-ftp-upload-under-windows-xp-help-139017881.html";
And "http://winscp.net/forum/viewtopic.php?t=6458";.  Many FTP
implementations have botched this.  TCP has all the machinery to
guarantee that both ends know the transfer completed
properly, but it's often misused.

   Looking at the Python source, it doesn't look good.  The "ftplib"
module does sending by calling sock_sendall in "socketmodule.c".
That does an OS-level "send", and once everything has been sent,
returns.

   But an OS-level socket send returns when the data is queued for
sending, not when it is delivered.  Only when the socket is closed,
and the close status checked, do you know if the data was delivered.
There's a final TCP close handshake that occurs when close has
been called at both ends, and only when it completes successfully
do you know that the data has been delivered.

   At the socket level, this is performed by "shutdown" (which
closes the connection and returns the proper network status
information), or by "close" (which forces a shutdown but doesn't
return status).

   Look at sock_close in "socketmodule.c".  Note that it ignores the
return status on close, always returns None, and never raises an exception. As the Linux manual page for "close" says: "Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data."

   "ftplib", in "storlines" and "storbinary", calls "close"
without calling "shutdown" first.  So if the other end disconnects
after all data has been queued but not received, the sender will
never know.  FAIL.

   So there's your bug.

                                John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to