Hello Valters, On Tue, Oct 24, 2023 at 02:03:03AM +0300, Valters Jansons wrote: > Hello, > > The trace log is uploaded at > https://gist.github.com/sigv/58a5d148579c75d39b2b7c76a3254fa5 > > We are running 2.9-dev8 for the server connection close fix for > "not-so-great" gRPC clients. We just experienced an ha_panic seemingly > triggered from OpenSSL 3. This is a fairly default Ubuntu 22.04 > system, with locally built HAProxy package (as there are no "official" > dev builds).
Hmm that's not cool. Did it happen only once or repeatedly ? From what I'm seeing, one of the SSL library calls froze the thread for 4 seconds without making progress. Given openssl 3's extreme abuse of locking, it sounds perfectly possible that under load one such thread never manages to make progress and repeatedly fails. > Our SSL/TLS configuration is fairly basic too. I do not think it > contributes to the issue on hand. On bind we have `strict-sni`, a > `crt-list` specified and `alpn h2,http/1.1`. > > ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets > ssl-default-bind-ciphers > ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 > ssl-default-bind-ciphersuites > TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 I'm not fluent in this but I'm not seeing any excentricities there. > In our log, we have some "SSL handshake failure" lines and some more > detailed "SSL handshake failure (error:0A00010B:SSL routines::wrong > version number)" lines. I presume these are not related -- instead > being caused by some clients potentially connecting to port 443 and > trying to talk plaintext, or wanting to run TLS 1.1 or older. It's very possible, indeed. Do you have SSL on the frontend only or also on the backend ? I'm asking because openssl3 is very bad on the frontend but it's close to unusable at all on the backend. We've seen configs saturate the CPU using only health checks! Also, is you machine heavily loaded or not ? I'm trying to estimate if it's worth switching to alternate locks in your case. In case you're interested in giving it a try, you can add USE_PTHREAD_EMULATION=1 to your "make" command line. It may seem to use more CPU but will in fact replace the sleeping wait by an active wait and for a shorter time, resulting in faster processing and a real (not just apparent) load reporting. Otherwise, if your SSL load is high, particularly on the backend, you may need to switch back to a distro featuring openssl 1.1.1 (such as ubuntu 20.04 for example). Regards, Willy