On Tue, Jan 14, 2025 at 6:00 AM Jacob Champion < jacob.champ...@enterprisedb.com> wrote:
> On Mon, Jan 13, 2025 at 3:21 PM Jacob Champion > <jacob.champ...@enterprisedb.com> wrote: > > Next email will discuss the architectural bug that Kashif found. > > Okay, here goes. A standard OAuth connection attempt looks like this > (oh, I hope Gmail doesn't mangle it): > > Issuer User libpq Backend > | | | > | x -----> x -----> o [1] Startup Packet > | | | | > | | x <----- x [2] OAUTHBEARER Request > | | | | > | | x -----> x [3] Parameter Discovery > | | | | > | | x <----- o [4] Parameters Stored > | | | > | | | > | | | > | | x -----> o [5] New Startup Packet > | | | | > | | x <----- x [6] OAUTHBEARER Request > | | | | > x <----- x <----> x | > x <----- x <----> x | [7] OAuth Handshake > x <----- x <----> x | > | | | | > o | x -----> x [8] Send Token > | | | > | <----- x <----- x [9] Connection Established > | | | > x <----> x <----> x > x <----> x <----> x [10] Use the DB > . . . > . . . > . . . > > When the server first asks for a token via OAUTHBEARER (step 2), the > client doesn't necessarily know what the server's requirements are for > a given user. It uses the rest of the doomed OAUTHBEARER exchange to > store the issuer and scope information in the PGconn (step 3-4), then > disconnects and sets need_new_connection in PQconnectPoll() so that a > second connection is immediately opened (step 5). When the OAUTHBEARER > mechanism takes control the second time, it has everything it needs to > conduct the login flow with the issuer (step 7). It then sends the > obtained token to establish a connection (steps 8 onward). > > The problem is that step 7 is consuming the authentication_timeout for > the backend. I'm very good at completing these flows quickly, but if > you can't complete the browser prompts in time, you will simply not be > able to log into the server. Which is harsh to say the least. (Imagine > the pain if the standard psql password prompt timed out.) DBAs can get > around it by increasing the timeout, obviously, but that doesn't feel > very good as a solution. > > Last week I looked into a fix where libpq would simply try again with > the stored token if the backend hangs up on it during the handshake, > but I think that will end up making the UX worse. The token validation > on the server side isn't going to be instantaneous, so if the client > is able to complete the token exchange in 59 seconds and send it to > the backend, there's an excellent chance that the connection is still > going to be torn down in a way that's indistinguishable from a crash. > We don't want the two sides to fight for time. > > So I think what I'm going to need to do is modify v41-0003 to allow > the mechanism to politely hang up the connection while the flow is in > progress. This further decouples the lifetimes of the mechanism and > the async auth -- the async state now has to live outside of the SASL > exchange -- but I think it's probably more architecturally sound. Yell > at me if that sounds unmaintainable or if there's a more obvious fix > I'm missing. > > Huge thanks to Kashif for pointing this out! > Thanks Jacob, the latest patch fixed the issues. > > --Jacob >