On Fri, May 10, 2024 at 7:23 AM Daniel Gustafsson <dan...@yesql.se> wrote: > The way multiple certificates are handled is that libpq creates one SSL_CTX > for > each at startup, and switch to the appropriate one when the connection is > inspected.
I fell in a rabbit hole while testing this patch, so this review isn't complete, but I don't want to delay it any more. I see a few possibly-related problems with the handling of SSL_context. The first is that reloading the server configuration doesn't reset the contexts list, so the server starts behaving in really strange ways the longer you test. That's an easy enough fix, but things got weirder when I did. Part of that weirdness is that SSL_context gets set to the last initialized context, so fallback doesn't always behave in a deterministic fashion. But we do have to set it to something, to create the SSL object itself... I tried patching all that, but I continue to see nondeterministic behavior, including the wrong certificate chain occasionally being served, and the servername callback being called twice for each connection (?!). Since I can't reproduce the weirdest bits under a debugger yet, I don't really know what's happening. Maybe my patches are buggy. Or maybe we're running into some chicken-and-egg madness? The order of operations looks like this: 1. Create a list of contexts, selecting one as an arbitrary default 2. Create an SSL object from our default context 3. During the servername_callback, reparent that SSL object (which has an active connection underway) to the actual context we want to use 4. Complete the connection It's step 3 that I'm squinting at. I wondered how, exactly, that worked in practice, and based on this issue the answer might be "not well": https://github.com/openssl/openssl/issues/6109 Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is fundamentally broken. So it might just be FUD, but I'm wondering if we should instead be using the SSL_ flavors of the API to reassign the certificate chain on the SSL pointer directly, inside the callback, instead of trying to set them indirectly via the SSL_CTX_ API. Have you seen any weird behavior like this on your end? I'm starting to doubt my test setup... On the plus side, I now have a handful of debugging patches for a future commitfest. Thanks, --Jacob