RE: SSL_read() and dropped (half-open) connections

Michael Wojcik Fri, 09 May 2014 14:16:31 -0700

> From: owner-openssl-us...@openssl.org [mailto:owner-openssl-
> us...@openssl.org] On Behalf Of Tilman Sauerbeck
> Sent: Thursday, 08 May, 2014 12:26
> 
> my program is an SSL client which is reading large amounts of data
> without sending data itself (after the initial handshake).
> My machine's connection does drop regularly, and I want to make sure
> that my program detects the dropped connection instead of hanging in
> read()/recv() forever.
> ...
> Another attempt was to use select() to check if the socket is readable
> just before calling SSL_read(), like so:


That's not useful in your case. When you're receiving on a TCP connection and 
not sending, and the connection is closed, the socket will be flagged as 
readable. With select() you can't distinguish between a connection that has 
data available and a connection that has received a TCP FIN or RST.

Select's "readable" status means "a read operation on this descriptor will 
return immediately, and if the descriptor is non-blocking, will not return -1 
with errno set to EAGAIN/EWOULDBLOCK". It's set for any condition in which read 
would return immediately: data available, connection closed, or error pending.

> Without the SO_RCVTIMEO, this doesn't work either, probably because
> I'm only using select() if SSL_read() failed with SSL_ERROR_WANT_READ
> before.

I suspect it doesn't work because select is returning non-zero, because the 
socket is readable.

> In combination with the socket timeout (SO_RCVTIMEO), the code above
> does work, but it doesn't feel right.

To some extent this depends on what happens when your connection fails, and 
whether you have TCP keepalives enabled for the socket.

A connection failure doesn't necessarily cause the stack to abort the 
conversation. "Connection failure" isn't well-defined for TCP/IP - it can mean 
any of a number of things, and different implementations are going to handle it 
differently. Since IP was designed as a best-effort, failure-tolerant protocol, 
many implementations tolerate connection failure in the hope that packets for 
the conversation can still arrive by another route, or will be able to arrive 
later.

If the local side isn't sending, then there's no default timeout for this 
behavior - the receiving side can keep the conversation open forever, as long 
as it managed to ACK the most recent transmission it received from the peer. 
(If the local side is sending, the TCP retransmit timer will expire eventually.)

A passive endpoint - one that's just receiving - can try to detect connection 
failure in various ways. It can use periodic application-level probes, which is 
what the infamous TLS Heartbeat mechanism is for. It can enable TCP keepalive, 
which is a probe at the TCP level. Or it can use SO_RCVTIMEO.

None of these are ideal. Application-level probes require logic on both sides. 
TCP keepalive is often only configurable on a system-wide basis (though it's 
enabled per-socket) and generally takes over an hour to decide a connection has 
failed. SO_RCVTIMEO isn't universally supported (it's standard in SUSv3 but 
implementations aren't required to honor it). And all of these have the 
architectural disadvantage of sacrificing IP's recoverability in favor of 
deciding to give up programmatically; they take that decision out of the user's 
hands.

> I'm also wondering if BIO_sock_non_fatal_error() needs to be fixed to
> treat EAGAIN and EWOULDBLOCK as fatal _iff_ the socket is blocking --
> since that means that we hit a timeout.

That might be a useful enhancement, but you can't depend on it on all 
platforms, since SO_RCVTIMEO isn't guaranteed.

> I know I can work around this issue by manually checking errno for
> EAGAIN/EWOULDBLOCK in case SSL_get_error() returns SSL_ERROR_WANT_READ,
> but that seems the least solid solution.

To be honest, that's what I'd do.

> Can anyone shed some light on this issue?
> What am I missing?

Only that usual practice for an application that wants to be able to abort a 
TCP receive operation is to use non-blocking sockets. SO_RCVTIMEO was created 
when threading became commonly available on UNIX platforms and blocking network 
I/O became a more usable approach for complex applications. Consequently, most 
people who want more control over the behavior of a passive TCP endpoint still 
use nonblocking sockets.

> Please CC me in your replies; I'm not subscribed to the list.

Hmm. In my day, that was considered rude. You kids with your music that's not 
identical to my music and hairstyles that aren't identical to my hairstyle...

-- 
Michael Wojcik
Technology Specialist, Micro Focus




This message has been scanned for malware by Websense. www.websense.com
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           majord...@openssl.org

RE: SSL_read() and dropped (half-open) connections

Reply via email to