Long info because I fear the Python module maybe misunderstanding what
SSL_shutdown() actually does and why it exists. Which in turn mean that
users of the Python module also misuse it (sandcastles in the sand and
all that).
Antoine Pitrou wrote:
While testing Python's SSL support with OpenSSL >= 0.9.8m, we have
encountered a strange error return from SSL_shutdown on a non-blocking
socket (note: this is a different problem from the one described by
Victor Stinner in an earlier thread last month). Basically:
- SSL_shutdown(<ssl object>) returns -1
- SSL_get_error(<ssl object>, -1) returns SSL_ERROR_SYSCALL
- ERR_get_errno() returns 0
- errno is equal to 0
This situation was not hit before 0.9.8m. Our temptative workaround
right now (not yet committed, awaiting your insight :-)) is to detect
this particular situation and consider the call successful rather than
raise an exception.
It depends what you mean by "consider the call successful". There are 2
normal non-error states for SSL_shutdown() API calls, returning 0 and
returning 1.
You should never consider a return of -1 to mean 1. Also a return of 1
is really the only value that indicates "success".
Then you have errors that are either recoverable (what I term
soft-errors) and non-recoverable (hard-errors).
But as the recent mailing list thread indicates (
http://www.mail-archive.com/openssl-users@openssl.org/msg60444.html )
you may consider the specific soft-error returns of -1/WANT_READ and
-1/WANT_WRITE to be successful as-if SSL_shutdown() had returned 0. If
you are happy with keeping the non-descriptive behavior of older OpenSSL
releases.
The SYS_ERROR_SYSCALL it probably because the underlying socket is no
longer functional (see the comment overs EPIPE / ZERO_RETURN from the
recent openssl-users list thread).
You must understand it is SSL_shutdown()'s job to - commence, advance
and confirm that a cryptographically secure two-way shutdown has
performed. This is its purpose in the world. If you are seeing
-1/ERROR_SYSCALL then that is a _CORRECT_ thing for it to return in
response to observing that state while trying to perform its mission.
What SSL_shutdown() is saying by returning -1/ERROR_SYSCALL is that a
cryptographically two-way shutdown of the stream was _NOT_ completed and
that it will probably not be able to ever be completed, probably due to
the fact the underlying socket died on us. This is a fact of life you
have to live with and deal with in your application now. The reason for
the "probably" items; is that I'm sure there are other reasons that can
cause it but practically most people will see this error indication at
this stage due to those factors.
So thinking that SSL_shutdown() was successful would be incorrect, on
the basis of my definition of the purpose of SSL_shutdown(). A
cryptographically secure shutdown was not completed, therfore
SSL_shutdown() was not successful.
I'm sorry that I've introduced this quasi-fuzziness into what was a nice
clean wonderland of the Python SSL module. But it is a reality than an
application should deal with and make up its own choice about.
Many applications don't care for a cryptographically secure shutdown of
the communication transport, since they might indicate their intention
to "QUIT" in the normal application payload data. The other end would
then send back a "Bye bye, quit response message" (in the normal
application payload data) and the server end goes into a state of never
accepting any further commands from the client after that. Over and
above all this, once each end has queued the last command/response data
in respect of the "QUIT" command processing, once that application
payload has successfully cleared the SSL_write() API call, that end can
immediately proceed to calling SSL_shutdown(). This will commence
proceedings in respect of a secure cryptographic shutdown, by denying
any further SSL_write() calls (from your side) and by sending an
end-of-stream indication packet to the other end. You then have to wait
(and hope) the other end sends their end-of-stream indication packet,
before you will see SSL_shutdown() return 1 on your side. Only once you
have both sent and received the end-of-stream indication packet will
SSL_shutdown() return 1.
Many client and server implementations just "hang up" on each other once
the QUIT command response has been processed. I would guess the issue
you are seeing with -1/ERROR_SYSCALL is due to this hanging up. But to
be a good well meaning TLS/SSL citizen both ends should continue their
non-blocking event loops for a reasonable amount of time (in the order
of 5 to TCP timeout seconds) even after the last SSL_write() has been
made. During this time both ends retry SSL_shutdown() over and over
until it returns 1 (each time they get a non-blocking wakeup indication).
So you have to stand back for a moment and examine Python's use of the
OpenSSL API and decide if you are trying to be 1:1 as much as possible
to support and pass on all the cryptographic guarantees that OpenSSL
makes or if you are trying to provide a simplified view of the world
that Noddy and Big-Ears could use. Or maybe both by creating a Python
specific API calls built on top of this understanding that irons out the
issue by providing easy to digest error returns that users might like.
If you are able to observe a -1 error state where you think that a 1
should have been returned that maybe considered as a new bug. i.e.
SSL_shutdown() should return 1 at least once (possibly to be
sticky/latched) once that point in proceedings has been passed
(regardless of the overall status of the underlying transport/socket).
I am interested in the issue of errno==0, this maybe indicative of the
real errno return being lost. OpenSSL should if necessary preserve the
first errno value it didn't expect to see, even if OpenSSL itself
continues to make kernel calls that could reset the value of errno to 0.
Maybe this situation can be simulated by being a bad citizen and forcing
a socket disconnection after one or both ends have called SSL_shutdown()
at least once. I must say my testing and applications are good citizens
so it may never have been noticed; also that I may have treated the
-1/ERROR_SYSCALL case as being "unrecoverable" once SSL_shutdown() has
been started and therefore never look to check if the errno!=0 (since I
don't care for the specific reason in my usage).
What encouraged me in that workaround is that some LightHTTPd users have
encountered what looks like the same issue, also starting from 0.9.8m:
http://redmine.lighttpd.net/boards/2/topics/2779
« SSL_shutdown failed, SSL_get_error returned SSL_ERROR_SYSCALL,
but errno == 0 - I think there is something wrong with your ssl
lib. »
« Since I updated to openssl 0.9.8m I have noticed the same
error messages in my log. (using lighttpd 1.4.26 with the same
patch applied) »
I would welcome any explanations and suggestions concerning this
situation. Is it an OpenSSL bug? Or does this error return correspond to
an applicative error? (in which case, which error exactly, since the
return codes don't point to anything precise)
Well the simplified view of it is this (the exact errno reason isn't
important in the decision making process, since it does not change the
outcome).
I still think it is probably due to the state of the network socket
changing to being no longer operational BEFORE SSL_shutdown() could
complete the two-way cryptographic shutdown.
So as such this situation is unrecoverable.
So as such the correct course of action is to accept that SSL_shutdown()
did not complete and to deallocate SSL objects and to clean up your
sides affairs by doing such things as closing the the socket handle you
are holding.
I think you are correct to assert that an OpenSSL bug exists if you are
able to observe -1/ERROR_SYSCALL and errno==0.
But it is not a bug to observe -1/ERROR_SYSCALL from SSL_shutdown().
HTH
Darryl
______________________________________________________________________
OpenSSL Project http://www.openssl.org
User Support Mailing List openssl-users@openssl.org
Automated List Manager majord...@openssl.org