On Mon, May 25, 2026 at 10:13 AM Daniel Gustafsson <[email protected]> wrote:
> > On 24 May 2026, at 22:04, Sehrope Sarkuni <[email protected]> wrote: > > The first patch fixes parsing of SCRAM iteration counts in both the > backend SCRAM verifier parser and libpq's server-first-message parser. > Previously both paths parsed into a long and then stored the result in an > int, so values in (INT_MAX, LONG_MAX] could be accepted by strtol() and > then narrowed incorrectly. I don't think this allows for any invalid logins > as the password verifier would have tried to verify a different iteration > count, but it's still wrong. > > The iteration count parsing you refer to is extracting the iterations from > the > stored password, it's not parsing user input in any way. The iteration > count > in set in the password using scram_sha_256_iterations, which in turn can > be set > by the GUC scram_iterations which is limited to 1..INT_MAX (before being > accessible as a GUC it was hardcoded to 4096). > > strtol() to an int without checking if the parsed value exceeds INT_MAX > isn't > good code hygiene, and we should fix that, but it cannot in practice cause > truncation AFAICS. > > Can you show a sequence of generating a SCRAM secret which has an iteration > count outside of 0..INT_MAX, or one which doesn't correspond to the > setting in > scram_iterations? > You can directly update pg_authid to skip the empty password check entirely. We do that in pgjdbc to test out connecting against some nonsensical values. A regular user can also specify the pre-hashed password in CREATE / ALTER USER. The server will still try to verify the password to ensure that it is not an empty string: // src/backend/commands/user.c#L442 if (password[0] == '\0' || plain_crypt_verify(stmt->role, password, "", &logdetail) == STATUS_OK) { ereport(NOTICE, (errmsg("empty string is not a valid password, clearing password"))); new_record_nulls[Anum_pg_authid_rolpassword - 1] = true; } Normally that would prevent you from trying to create something with a huge value of iterations as it'd never finish. But the long-to-int overflow means that it finishes immediately. postgres=# CREATE USER scram_test PASSWORD 'abcd'; CREATE ROLE # Default of 4096 iterations postgres=# SELECT rolpassword FROM pg_authid WHERE rolname = 'scram_test'; rolpassword --------------------------------------------------------------------------------------------------------------------------------------- SCRAM-SHA-256$4096:PTh5Qe4SmvJnUSHsa6CJpg==$rbWsmPJVYzgjIwnVfmfZ43A0FR0eThILm+TuYpYF4M0=:gFahnsW7KHa73RW1Vk/eP6avCOWf1O2LgCW1ZgiRHtw= (1 row) # This hangs effectively for ever as it's trying 2^31-1 iterations to verify that it's not empty string postgres=# ALTER USER scram_test PASSWORD 'SCRAM-SHA-256$2147483647:PTh5Qe4SmvJnUSHsa6CJpg==$rbWsmPJVYzgjIwnVfmfZ43A0FR0eThILm+TuYpYF4M0=:gFahnsW7KHa73RW1Vk/eP6avCOWf1O2LgCW1ZgiRHtw='; ^C Cancel request sent ERROR: canceling statement due to user request Time: 4812.007 ms (00:04.812) # This completes instantly because it overflows and the empty string password check passes. Trying to connect gives a scram iterations from the server of "-1". postgres=# ALTER USER scram_test PASSWORD 'SCRAM-SHA-256$2147483648:PTh5Qe4SmvJnUSHsa6CJpg==$rbWsmPJVYzgjIwnVfmfZ43A0FR0eThILm+TuYpYF4M0=:gFahnsW7KHa73RW1Vk/eP6avCOWf1O2LgCW1ZgiRHtw='; ALTER ROLE Time: 0.773 ms > > Patches 0003 through 0005 address a separate client-side resource issue. > In a SCRAM exchange, the PBKDF2 iteration count is supplied by the server. > A hostile or misconfigured server can advertise a very large iteration > count and keep a blocking libpq connection inside scram_SaltedPassword() > beyond the caller's connect_timeout. The backend has CHECK_FOR_INTERRUPTS() > inside the loop, but the frontend previously had no equivalent. > > A server which request the iteration count of the stored secret, whatever > it > is, cannot be considered misconfigured or hostile. A server which sends an > incorrect iteration count in order to reject a connection after causing the > client to perform CPU work is hostile (but I'd be quite glad if a hostile > server rejected my connection rather than tricking me into logging in and > stealing/corrupting data). > The attack vector here is a client side denial of service. I previously submitted this to the security list and it was deemed out of scope (as a security issue for core) as it's a client side cpu / memory issue. A real world example of it would be a SaaS or reporting system that connects to user specified databases. A malicious user who could specify the remote database host / port (common in that kind of platform) could get the app to perform the PBKDF2 iterations forever. There's plenty of other things a malicious server can do as well (e.g., forcing allocation of infinite memory). So again out of scope for core as a security issue, but still a problem in specific use cases. > Making the SCRAM calculation honor the connection timeout sounds like a > good > way to address the latter. > > > These patches mirror the blocking connection attempt's deadline into > PGconn, add an optional interrupt callback to the common SCRAM PBKDF2 > helper, and have libpq use that callback to abort once the in-flight > bl:qaocking connection deadline has expired. The test modifies a SCRAM > verifier to advertise a very large iteration count and verifies that > connect_timeout interrupts the PBKDF2 loop. > > > > This protection applies to the blocking connection path, such as > PQconnectdb(). It does not make connect_timeout apply automatically to > applications driving PQconnectPoll() themselves. > > > > I considered passing the connection deadline directly into the SCRAM > PBKDF2 helper, but used an interrupt callback instead. That keeps the > common SCRAM code independent of libpq's timeout representation and allows > the same mechanism to support other frontend abort conditions, such as > cancellation of an in-progress connection attempt. > > Making sure SCRAM secret calculation doesn't exceed connection_timeout > sounds > like a good idea. However, ode shared between frontend and backend should > IMHO > not have parameters which only work in the frontend. I suppose it could be enabled for the backend as well. It'd end up being an additional if-check to skip the NULL arg. Alternatively the backend could supply CFI/wrapper as the callback. Then the scram code would not know anything about who's invoking it. > I'm also not convinced > that it's a good idea to add a callback which in your case gets the entire > PGconn object. I'd prefer a check more like the CFI we have in the backend > code. > Is there a front end example of that to look at? Or are you describing something more general that does not exist yet? Another issue is the interval for checking. One point of allowing > configurable > iteration counts was to allow constrained low-power IOT platforms to use > SCRAM > by using a much lower iteration count. Only checking for timeout every > 4096 > iterations might be well past the connection timeout. Perhaps the > interval of > checking should be a function of the iterations sent by the server? > The time check itself is relatively cheap compared to the hashing so lowering it something more fine grained should not matter much. I picked 4096 to match the default scram iterations so it does not even run in the normal case. Maybe 256? The SCRAM RFC estimates it taking .5 seconds to run 4096 iterations on 2015 "current mobile handsets". Would figure it to be even less for devices 10+ years later. And connect timeout granularity is in seconds. > > The attached patch currently uses a default cap of 100K. That is well > above PostgreSQL's normal SCRAM iteration count (4K), but it is still a > client-side behavior change for installations using unusually high SCRAM > iteration counts. For reference, we added a similar patch to pgjdbc with > the same 100K default. > > I am off the cuff -1 on adding more connection parameters for niche > usecases > like this (we already have many enough to make it confusing), it will be > very > hard to document what an appropriate setting is, and getting it wrong might > mean rejecting connections in err. > For additional reference, we added similar parameters to the java driver pgjdbc and the node driver node-postgres. Both with a 100K limit. Both also had a bit more of a direct issue with the large values (v.s. libpq) as the java driver would create a background thread for the login (which would get pinned to 100% forever with a huge iteration count) and the node driver would also hang indefinitely (as node is single threaded and the crypto functions are not interruptible). Regards, -- Sehrope Sarkuni Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
