On Mon, 13 May 2019 at 06:31:26 +0200, Steffen Ullrich wrote:
> Applications which relied on blocking I/O in connection with select could
> also hang before,

Uh, what?  “Before” meaning with ≤TLSv1.2, or with OpenSSL <1.1.1a's
default flags?  libssl mentions no such thing beside the new default
mode.  And in fact the s_client() program, *from OpenSSL upstream
itself*, does precisely that: a select loop in blocking I/O mode (unless
the ‘-nbio’ flag is set).

    https://github.com/openssl/openssl/blob/OpenSSL_1_1_1/apps/s_client.c#L2817

s_client() is roughly speaking a C version the ‘netcat.pl’ prototype I
attached earlier.  Unsurprisingly, since 1.1.1a the code clears
SSL_MODE_AUTO_RETRY from the bitmask mode of the newly created SSL_CTX
object.  That change was even done in the very same commit that enabled
SSL_MODE_AUTO_RETRY by default :-)

    
https://github.com/openssl/openssl/commit/693cf80c6ff54ae276a44d305d4ad07168ec6895#diff-7f3b79983f6d53c047c90a62813cc11f

IMHO using polling sockets in blocking I/O is fairly common.  Granted,
that itself itself doesn't say much about code quality, but the fact that
the OpenSSL project *does* use it (and explicitly mentions it in the
docs) gives me confidence that it's a fine use which should still be
supported.

> only this problem is worse now. Before TLS 1.3 these applications
> could hang if the peer initiated a renegotiation since this were TCP
> level data without any SSL application payload, i.e. select was
> triggered but sysread would not return with data.

I feel that we're talking past each other :-/  select() being triggered
means indeed that a subsequent read() system call *won't block*;
however read() is not called directly, but through SSL_read().  Whether
SSL_read() will block until application data is received, or not, is
controlled with the SSL_MODE_AUTO_RETRY flag.  If that flag is unset,
then SSL_read() returns immediately with failure and sets SSL_ERROR.
For an IO::Socket::SSL object, $sock->sysread() is AFAICT a mere wrapper
for SSL_read():

    
https://github.com/noxxi/p5-io-socket-ssl/blob/2.066/lib/IO/Socket/SSL.pm#L1187

So if select() is triggered because the TLS session is renegotiated, but
no application data was received, sysread will block iff. SSL_MODE_AUTO_RETRY
is set.  As I showed in the traces enclosed in my previous message.

> Additionally switching off SSL_MODE_AUTO_RETRY would actually just add a
> different unexpected behavior: that sysread might return with EAGAIN on a
> blocking socket. This is not the behavior one expects from a blocking
> socket, i.e. it should block until it returns data, should return no data
> only on connection shutdown or should fail permanently.

The documentation (libssl's SSL_read()'s [0], Net::SSLeay::read() [1],
as well as IO::Socket::SSL::sysread() [2]) all warn that reading from an
SSL socket has a different behavior than usual read(2) system calls, in
that read failures should be treated with care, as there are both
retryable (eg. when no application data was received) and non-retryable
errors.

So from my perspective, the expectation that IO::Socket::SSL::sysread()
behaves like a “normal” sysread doesn't hold, and never did.  (There is
even an entry for “Expecting exactly the same behavior as plain
sockets” in the “Common errors” section of IO::Socket::SSL's manpage!)
For what it's worth I think it's a shame that SSL_MODE_AUTO_RETRY was
not the default earlier, as it's convenient to be able to write

    $sock->sysread($buf, $len) // die;

and not bother about inspecting $SSL_ERROR.  But programs have been
written with the old default in mind, and the fact that SSL_read()
doesn't behave like a normal read() system call.  For these programs to
keep working, there needs to be a way to switch SSL_MODE_AUTO_RETRY off.
libssl provides SSL_CTX_clear_mode(), Net::SSLeay has
CTX_ctrl(,78,SSL_CTRL_CLEAR_MODE,0), and I'd like to have something
similar in IO::Socket::SSL, too.

> It was just a coincidence that LWP::protocol::http could deal with
> this situation. And this coincidence came from the fact, that this
> code was actually designed for non-blocking sockets and only the
> Debian patch caused it to use a blocking socket instead.

Yup, I now agree with you as far as LWP is concerned.
 
> I've added more information regarding this to the IO::Socket::SSL
> documentation:
> https://github.com/noxxi/p5-io-socket-ssl/commit/ee176e489f02bfaaa479fc8d9312c8458bf55565

| A sysread on the IO::Socket::SSL socket will not return any data
| though since it is an abstraction which only returns application data.
| This causes the sysread to hang in case the socket was blocking

I believe that statement is incorrect for OpenSSL <1.1.1a's (or if
SSL_MODE_AUTO_RETRY was toggled via some other means).  If you try
netcat.pl with an older OpenSSL release, you won't be able to reproduce
the TLSv1.2 trace I pasted yesterday; the client doesn't end up stuck in
a blocking read :-)  If no application data is available, then
IO::Socket::SSL::sysread() hangs in blocking I/O iff. SSL_MODE_AUTO_RETRY
is set.

Cheers,
-- 
Guilhem.

[0] “If the underlying BIO is blocking, a read function will only return
    once the read operation has been finished or an error occurred,
    except when a non-application data record has been processed and
    SSL_MODE_AUTO_RETRY is not set. Note that if SSL_MODE_AUTO_RETRY is
    set and only non-application data is available the call will hang.”
    — https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html
[1] “If you need to select(2) on the socket, go right ahead, but be
    warned that OpenSSL does some internal buffering so SSL_read does
    not always return data even if the socket selected for reading (just
    keep on selecting and trying to read). "Net::SSLeay" is no different
    from the C language OpenSSL in this respect.”
    — https://metacpan.org/pod/Net::SSLeay#Sockets
[2] “Thus, in order to decide if you can read more data (e.g. if sysread
    will block) you must check if there are still data in the current
    SSL frame by calling pending […]
    […]
    Also, calls to sysread might fail, because it must first finish an
    SSL handshake.”
    — https://metacpan.org/pod/IO::Socket::SSL#sysread

Attachment: signature.asc
Description: PGP signature

Reply via email to