I have been recently debugging some corner cases in OpenSSL's SSL_shutdown call in sendmail (I ask your forgiveness) and now that I seem to have it right there I have decided to look at other mailers for similar issues.
A discussion with the OpenSSL folks on how to properly shut down a connection, along with my revised sendmail patch (which has been submitted to the sendmail maintiners) is at https://github.com/openssl/openssl/issues/13976 Looking at postfix/src/tls/tls_bio_ops.c I see the following potential problems. Note I do not run, nor have I tested a postfix patch, so I may not have this _exactly_ right. What I think happens on a call to tls_bio for an SSL_shutdown: - Enter the for(;;). - hsfunc = SSL_shutdown is called the first time, which will return status = 0. - SSL_get_error will return err = SSL_ERROR_ZERO_RETURN. - In the switch, case SSL_ERROR_ZERO_RETURN is triggered. - It falls through SSL_ERROR_NONE. - It falls through SSL_ERROR_SYSCALL. - It returns up stack. I suspect the calling function is then going to close the connection, potentally before the 2-way handshake has been completed with a peer. This will result in an unclean shutdown. 1) For SSL_shutdown case SSL_ERROR_ZERO_RETURN should pause and retry. As documented by OpenSSL, to complete a 2-way handshake SSL_shutdown should be called the first time to initiate the handshake, and will return 0. It should then be retried while it returns SSL_ERROR_WANT_READ, until it finally returns 1 to indicate a successful 2 way shutdown. I believe minimally case SSL_ERROR_ZERO_RETURN should do something like this: /* Waiting for 2-way handshake. */ if (hsfunc == SSL_shutdown) { usleep(100); continue; } 2) SSL_ERROR_SYSCALL has a special case where it does not return a bad syscall. See https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html in the BUGS section, which states: The SSL_ERROR_SYSCALL with errno value of 0 indicates unexpected EOF from the peer. This will be properly reported as SSL_ERROR_SSL with reason code SSL_R_UNEXPECTED_EOF_WHILE_READING in the OpenSSL 3.0 release because it is truly a TLS protocol error to terminate the connection without a SSL_shutdown(). Postfix does not seem to have this special case, instead it assumes every SSL_ERROR_SYSCALL is in fact a syscall error. I suspect this is largely a cosmetic bug where it reports the wrong error message in logs/debugging. 3) Postfix will busy on a 2-way shutdown. If a fix like I describe in point #1 were put in place, Postfix would then busy wait on a 2-way shutdown. With TLS1.2+ the shutdown sequence has been changed to be a 2-way shutdown handshake. A request is sent to shut down TLS, and the far end must respond with an acknowledgement. In OpenSSL this is accomplished by a first call to SSL_shutdown to send the request, which returns 0, and then a subsquent call to SSL_shutdown that returns 1 to indicate the reply was received. While waiting for the reply, SSL_shutdown returns SSL_ERROR_WANT_READ, that it wants to read from the far end. When Postfix receives SSL_ERROR_WANT_READ it checks for a timeout (this is good, don't wait forever), but then it simply breaks out of the switch and goes back into the for(;;) loop. The effect is to busy-wait for the reply from the far end. Given this can take network-time, e.g. 150ms from the US to Europe, that could be an expensive busy wait loop. I recommend at least for SSL_shutdown that SSL_ERROR_WANT_READ result in a short delay. I am testing a usleep of 100 microseconds (0.1ms) in sendmail so that this is not a busy wait loop. Some small delay may be appropriate for SSL_ERROR_WANT_READ from other functions as well, for similar reasons. As a warning for anyone testing modifications on an internet facing mail server. I found a case in sendmail where when acting as a client it does not perform a clean TLS shutdown. I also believe point #1 means that Postfix may not be cleanly shutting down all connections. As a result it's quite likely you will see significant unclean shutdowns from other mailers. Currernly I am seeing about 44% clean shutdowns, and 66% unclean shutdowns from remote Internet mailers. That's why I'm on a quest here to see if I can get all mailers to handle all of the cases correctly, and make the Internet a better place. -- Leo Bicknell - bickn...@ufp.org PGP keys at http://www.ufp.org/~bicknell/