On Wed, Sep 11, 2024 at 3:54 PM Jacob Champion <jacob.champ...@enterprisedb.com> wrote: > Yeah, and I still owe you all an updated roadmap.
Okay, here goes. New reviewers: start here! == What is This? == OAuth 2.0 is a way for a trusted third party (a "provider") to tell a server whether a client on the other end of the line is allowed to do something. This patchset adds OAuth support to libpq with libcurl, provides a server-side API so that extension modules can add support for specific OAuth providers, and extends our SASL support to carry the OAuth access tokens over the OAUTHBEARER mechanism. Most OAuth clients use a web browser to perform the third-party handshake. (These are your "Okta logins", "sign in with XXX", etc.) But there are plenty of people who use psql without a local browser, and invoking a browser safely across all supported platforms is actually surprisingly fraught. So this patchset implements something called device authorization, where the client will display a link and a code, and then you can log in on whatever device is convenient for you. Once you've told your provider that you trust libpq to connect to Postgres on your behalf, it'll give libpq an access token, and libpq will forward that on to the server. == How This Fits, or: The Sales Pitch == The most popular third-party auth methods we have today are probably the Kerberos family (AD/GSS/SSPI) and LDAP. If you're not already in an MS ecosystem, it's unlikely that you're using the former. And users of the latter are, in my experience, more-or-less resigned to its use, in spite of LDAP's architectural security problems and the fact that you have to run weird synchronization scripts to tell Postgres what certain users are allowed to do. OAuth provides a decently mature and widely-deployed third option. You don't have to be running the infrastructure yourself, as long as you have a provider you trust. If you are running your own infrastructure (or if your provider is configurable), the tokens being passed around can carry org-specific user privileges, so that Postgres can figure out who's allowed to do what without out-of-band synchronization scripts. And those access tokens are a straight upgrade over passwords: even if they're somehow stolen, they are time-limited, they are optionally revocable, and they can be scoped to specific actions. == Extension Points == This patchset provides several points of customization: Server-side validation is farmed out entirely to an extension, which we do not provide. (Each OAuth provider is free to come up with its own proprietary method of verifying its access tokens, and so far the big players have absolutely not standardized.) Depending on the provider, the extension may need to contact an external server to see what the token has been authorized to do, or it may be able to do that offline using signing keys and an agreed-upon token format. The client driver using libpq may replace the device authorization prompt (which by default is done on standard error), for example to move it into an existing GUI, display a scannable QR code instead of a link, and so on. The driver may also replace the entire OAuth flow. For example, a client that already interacts with browsers may be able to use one of the more standard web-based methods to get an access token. And clients attached to a service rather than an end user could use a more straightforward server-to-server flow, with pre-established credentials. == Architecture == The client needs to speak HTTP, which is implemented entirely with libcurl. Originally, I used another OAuth library for rapid prototyping, but the quality just wasn't there and I ported the implementation. An internal abstraction layer remains in the libpq code, so if a better client library comes along, switching to it shouldn't be too painful. The client-side hooks all go through a single extension point, so that we don't continually add entry points in the API for each new piece of authentication data that a driver may be able to provide. If we wanted to, we could potentially move the existing SSL passphrase hook into that, or even handle password retries within libpq itself, but I don't see any burning reason to do that now. I wanted to make sure that OAuth could be dropped into existing deployments without driver changes. (Drivers will probably *want* to look at the extension hooks for better UX, but they shouldn't necessarily *have* to.) That has driven several parts of the design. Drivers using the async APIs should continue to work without blocking, even during the long HTTP handshakes. So the new client code is structured as a typical event-driven state machine (similar to PQconnectPoll). The protocol machine hands off control to the OAuth machine during authentication, without really needing to know how it works, because the OAuth machine replaces the PQsocket with a general-purpose multiplexer that handles all of the HTTP sockets and events. Once that's completed, the OAuth machine hands control right back and we return to the Postgres protocol on the wire. This decision led to a major compromise: Windows client support is nonexistent. Multiplexer handles exist in Windows (for example with WSAEventSelect, IIUC), but last I checked they were completely incompatible with Winsock select(), which means existing async-aware drivers would fail. We could compromise by providing synchronous-only support, or by cobbling together a socketpair plus thread pool (or IOCP?), or simply by saying that existing Windows clients need a new API other than PQsocket() to be able to work properly. None of those approaches have been attempted yet, though. == Areas of Concern == Here are the iffy things that a committer is signing up for: The client implementation is roughly 3k lines, requiring domain knowledge of Curl, HTTP, JSON, and OAuth, the specifications of which are spread across several separate standards bodies. (And some big providers ignore those anyway.) The OAUTHBEARER mechanism is extensible, but not in the same way as HTTP. So sometimes, it looks like people design new OAuth features that rely heavily on HTTP and forget to "port" them over to SASL. That may be a point of future frustration. C is not really anyone's preferred language for implementing an extensible authn/z protocol running on top of HTTP, and constant vigilance is going to be required to maintain safety. What's more, we don't really "trust" the endpoints we're talking to in the same way that we normally trust our servers. It's a fairly hostile environment for maintainers. Along the same lines, our JSON implementation assumes some level of trust in the JSON data -- which is true for the backend, and can be assumed for a DBA running our utilities, but is absolutely not the case for a libpq client downloading data from Some Server on the Internet. I've been working to fuzz the implementation and there are a few known problems registered in the CF already. Curl is not a lightweight dependency by any means. Typically, libcurl is configured with a wide variety of nice options, a tiny subset of which we're actually going to use, but all that code (and its transitive dependencies!) is going to arrive in our process anyway. That might not be a lot of fun if you're not using OAuth. It's possible that the application embedding libpq is also a direct client of libcurl. We need to make sure we're not stomping on their toes at any point. == TODOs/Known Issues == The client does not deal with verification failure well at the moment; it just keeps retrying with a new OAuth handshake. Some people are not going to be okay with just contacting any web server that Postgres tells them to. There's a more paranoid mode sketched out that lets the connection string specify the trusted issuer, but it's not complete. The new code still needs to play well with orthogonal connection options, like connect_timeout and require_auth. The server does not deal well with multi-issuer setups yet. And you only get one oauth_validator_library... Harden, harden, harden. There are still a handful of inline TODOs around double-checking certain pieces of the response before continuing with the handshake. Servers should not be able to run our recursive descent parser out of stack. And my JSON code is using assertions too liberally, which will turn bugs into DoS vectors. I've been working to fit a fuzzer into more and more places, and I'm hoping to eventually drive it directly from the socket. Documentation still needs to be filled in. (Thanks Daniel for your work here!) == Future Features == There is no support for token caching (refresh or otherwise). Each new connection needs a new approval, and the only way to change that for v1 is to replace the entire flow. I think that's eventually going to annoy someone. The question is, where do you persist it? Does that need to be another extensibility point? We already have pretty good support for client certificates, and it'd be great if we could bind our tokens to those. That way, even if you somehow steal the tokens, you can't do anything with them without the private key! But the state of proof-of-possession in OAuth is an absolute mess, involving at least three competing standards (Token Binding, mTLS, DPoP). I don't know what's going to win. -- Hope this helps! Next I'll be working to fold the patches together, as discussed upthread. Thanks, --Jacob