Re: [HACKERS] SSL renegotiation and other related woes

Andres Freund Wed, 11 Feb 2015 15:43:09 -0800

Hi,

On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:
> On 01/26/2015 12:14 PM, Andres Freund wrote:
> >Hi,
> >
> >When working on getting rid of ImmediateInterruptOK I wanted to verify
> >that ssl still works correctly. Turned out it didn't. But neither did it
> >in master.
> >
> >Turns out there's two major things we do wrong:
> >
> >1) We ignore the rule that once called and returning
> >    SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
> >    again with the same parameters. Unfortunately that rule doesn't mean
> >    just that the same parameters have to be passed in, but also that we
> >    can't just constantly switch between _read()/write(). Especially
> >    nonblocking backend code (i.e. walsender) and the whole frontend code
> >    violate this rule.
> 
> I don't buy that. Googling around, and looking at the man pages, I just
> don't see anything to support that claim. I believe it is supported to
> switch between reads and writes.


There's at least two implementations that seem to have workaround to
achieve something like that. Apache's mod_ssl and the ssl layer for
libevent.

> >2) We start renegotiations in be_tls_write() while in nonblocking mode,
> >    but don't properly retry to handle socket readyness. There's a loop
> >    that retries handshakes twenty times (???), but what actually is
> >    needed is to call SSL_get_error() and ensure that there's actually
> >    data available.
> 
> Yeah, that's just crazy.
> 
> Actually, I don't think we need to call SSL_do_handshake() at all. At least
> on modern OpenSSL versions. OpenSSL internally uses a state machine, and
> SSL_read() and SSL_write() calls will complete the handshake just as well.

Some blaming seems to show that that's been the case for a long time.

> >2) is easy enough to fix, but 1) is pretty hard. Before anybody says
> >that 1) isn't an important rule: It reliably causes connection aborts
> >within a couple renegotiations. The higher the latency the higher the
> >likelihood of aborts. Even locally it doesn't take very long to
> >abort. Errors usually are something like "SSL connection has been closed
> >unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
> >of other similar messages. There's a couple reports of those in the
> >archives and I've seen many more in client logs.
> 
> Yeah, I can reproduce that. There's clearly something wrong.

> I believe this is an OpenSSL bug. I traced the "unexpected message" error to
> this code in OpenSSL's s3_read_bytes() function:

Yep, I got to that as well.

> >    case SSL3_RT_APPLICATION_DATA:
> >        /*
> >         * At this point, we were expecting handshake data, but have
> >         * application data.  If the library was running inside ssl3_read()
> >         * (i.e. in_read_app_data is set) and it makes sense to read
> >         * application data at this point (session renegotiation not yet
> >         * started), we will indulge it.
> >         */
> >        if (s->s3->in_read_app_data &&
> >            (s->s3->total_renegotiations != 0) &&
> >            (((s->state & SSL_ST_CONNECT) &&
> >              (s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
> >              (s->state <= SSL3_ST_CR_SRVR_HELLO_A)
> >             ) || ((s->state & SSL_ST_ACCEPT) &&
> >                   (s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
> >                   (s->state >= SSL3_ST_SR_CLNT_HELLO_A)
> >             )
> >            )) {
> >            s->s3->in_read_app_data = 2;
> >            return (-1);
> >        } else {
> >            al = SSL_AD_UNEXPECTED_MESSAGE;
> >         SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
> >            goto f_err;
> >        }
> 
> So that quite clearly throws an error, *unless* the original application
> call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
> SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
> fault and it should "indulge" the application data message even in
> SSL_write().

That'd be nice, but I think there were multiple reports to them about
this and there wasn't much of a response. Maybe it's time to try again.

> In theory, I guess we should do similar hacks in the server, and always call
> SSL_read() before SSL_write(), but it seems to work without it. Or maybe
> not; OpenSSL server and client code is not symmetric, so perhaps it works in
> server mode without that.

I think we should do it on the server side - got some server side errors
when I cranked up the pg_receivexlog's status interval and set the wal
sender timeout very low.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL renegotiation and other related woes

Reply via email to