[HACKERS] SSL renegotiation and other related woes

Andres Freund Mon, 26 Jan 2015 02:16:49 -0800

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.


Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
   SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
   again with the same parameters. Unfortunately that rule doesn't mean
   just that the same parameters have to be passed in, but also that we
   can't just constantly switch between _read()/write(). Especially
   nonblocking backend code (i.e. walsender) and the whole frontend code
   violate this rule.

2) We start renegotiations in be_tls_write() while in nonblocking mode,
   but don't properly retry to handle socket readyness. There's a loop
   that retries handshakes twenty times (???), but what actually is
   needed is to call SSL_get_error() and ensure that there's actually
   data available.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like "SSL connection has been closed
unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

As far as I can see the only realistic way to fix 1) is to change both
frontend and backend code to:
a) Always check for socket read/writeability before calling
   SSL_read/write() when in nonblocking mode. That's a bit annoying
   because it nearly doubles the amount of syscalls we do or client
   communication, but I can't really se an alternative. That allows us
   to avoid waiting inside after a WANT_READ/WRITE, or havin to setup a
   larger state machine that keeps track what we tried last.

b) When SSL_read/write nonetheless returns WANT_READ/WRITE, even though
   we tested for read/writeability, we're very likely doing
   renegotiation. In that case we'll just have to block. There's already
   code that busy loops (and thus waits) in the frontend
   (c.f. pgtls_read's WANT_WRITE case, triggered during reneg). We can't
   just return immediately to the upper layers as we'd otherwise likely
   violate the rule about calling ssl with the same parameters again.

c) Add a somewhat hacky optimization whereas we allow to break out of a
   WANT_READ condition in a nonblocking socket when ssl->state ==
   SSL_ST_OK. That's the cases where it actually, at least by my reading
   of the unreadable ssl code, safe to not wait. That case is somewhat
   important because we otherwise can end up waiting on both sides due
   to b), even when nonblocking calls where actually made.  That
   condition essentially means that we'll only block if renegotiation or
   partial reads are in progress. Afaics at least.

d) Remove the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER hack - we don't
   actually need it anymore.

These errors are much less frequent when using a plain frontend
(e.g. psql/pgbench) because they don't use copy both stuff - the way
these clients use the FE/BE protocol there's essentially natural
synchronization points where nothing but renegotiation happens. With
walsender (or pipelined queries!) both sides can write at the same time.


My testcase for this is just to setup a server with a low
ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
receive data via pg_receivexlog -n. Usually it'll error out quickly.


I've done a preliminary implementation of the above steps and it
survives transferring 25GB of WAL via the replication protocol with a
ssl_renegotiation_limit=100kB - previously it failed much earlier.


Does anybody have a neater way to tackle this? I'm not happy about this
solution, but I really can't think of anything better (save ditching
openssl maybe).  I'm willing to clean up my hacked up fix for this, but
not if we can't find agreement on the approach.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SSL renegotiation and other related woes

Reply via email to