On Thursday, December 26, 2024 3:50 AM Tom Lane <t...@sss.pgh.pa.us> Hi,
> In connection with the discussion at [1], I started to look at exactly which > server > processes ought to be subject to connection limits (datconnlimit, > ACL_CONNECT, and related checks). The current situation seems to be an > inconsistent mess. > > Looking at InitPostgres, InitializeSessionUserId, and CheckMyDatabase, we > have several different permissions and resource-limiting checks that may be > enforced against an incoming process: > ReservedConnections/SuperuserReservedConnections > rolcanlogin > rolconnlimit > datallowconn > datconnlimit > database's ACL_CONNECT privilege > > I want to argue that ReservedConnections, rolconnlimit, and datconnlimit > should only be applied to regular backends. It makes no particular sense to > enforce them against autovac workers, background workers, or wal senders, > because each of those types of processes has its own resource-limiting > PGPROC pool. It's especially bad to enforce them against parallel workers, > since that creates edge-case failures that make the use of parallelism not > transparent to applications. We hacked around that at 73c9f91a1, but I think > that should be reverted in favor of not applying the check at all. > > I further argue that rolcanlogin, datallowconn, and ACL_CONNECT should not > be checked in a parallel worker, again primarily on the grounds that this > creates > parallelism-specific failures (cf [1]). > The two scenarios where this occurs are (1) permissions were revoked since > the leader process connected, or (2) leader is currently running as a role > that > wouldn't have permission to connect on its own. We don't attempt to kick out > the leader process when either of those things happen, so why should we > prevent it from using parallelism? > > The current situation for each of these checks is: > > ReservedConnections is enforced if !am_superuser && !am_walsender, so it is > enforced against non-superuser background workers, which is silly because > BG workers have their own PGPROC pool; moreover, what's the argument for > letting walsenders but not other kinds of background processes escape this? > I propose changing it to apply only to regular backends. > > rolcanlogin is enforced if IsUnderPostmaster and we reach > InitializeSessionUserId, which basically reduces to regular backends, parallel > workers, logrep workers, and walsenders. Seems reasonable for logrep > workers and walsenders which represent fresh logins, but not for parallel > workers. I propose fixing this by making ParallelWorkerMain pass > BGWORKER_BYPASS_ROLELOGINCHECK. > > rolconnlimit is enforced if IsUnderPostmaster and we reach > InitializeSessionUserId and it's not a superuser. So that applies to > non-superuser parallel workers, logrep workers, and walsenders, and I don't > think it's reasonable to apply it to any of them since those all come out of > other > PGPROC pools. I propose switching that to apply only to regular backends. > > BTW, I kind of wonder why rolconnlimit is ineffectual for superusers, > especially > when rolcanlogin does apply to them. Not a bug exactly, but it sure seems > inconsistent. If you've taken the trouble to set it you'd expect it to work. > Shall > we take out the !is_superuser check? > > datallowconn is enforced against all non-standalone, non-AV-worker > processes that connect to a specific database, except bgworkers that pass > BGWORKER_BYPASS_ALLOWCONN (nothing in core except test module). > So again that includes parallel workers, logrep workers, and walsenders. > Again this seems reasonable for logrep workers and walsenders but not for > parallel workers. I propose fixing this by making ParallelWorkerMain pass > BGWORKER_BYPASS_ALLOWCONN. > > datconnlimit is enforced against all non-superuser processes, including > per-DB walsenders and BG workers (see above). > This is fairly dubious given that they have their own PGPROC pools. > I propose switching that to apply only to regular backends, too. > > ACL_CONNECT is enforced against all non-superuser processes, including > per-DB walsenders and BG workers (includes parallel workers, subscription > apply workers, logrep workers). > Perhaps that's OK for most, but I argue not for parallel workers; maybe skip > it if > BGWORKER_BYPASS_ALLOWCONN? > > Also, the enforcement of datconnlimit and rolconnlimit is inconsistent in > another way: our counting of the pre-existing processes is pretty random. > CountDBConnections is not consistent with either the current set of processes > that datconnlimit is enforced against, or my proposal to enforce it only > against > regular backends. It counts anything that is not > AmBackgroundWorkerProcess, including AV workers and per-DB walsenders. > I think it should count only regular backends, because anything else leads to > weird inconsistencies in whether a rejection occurs. > > The same applies to CountUserBackends (used for rolconnlimit check). > I argue these two functions should count only regular backends, and the > enforcement should likewise be only against regular backends. Personally, I find these proposals to be reasonable. > Another recently-created problem is that the "slotsync worker" > process type we added in v17 hasn't been considered in any of this. > In particular, unlike every other process type that can obtain a PGPROC, we > did > not consider it in the MaxBackends calculation: > > /* the extra unit accounts for the autovacuum launcher */ > MaxBackends = MaxConnections + autovacuum_max_workers + 1 + > max_worker_processes + max_wal_senders; > > This means AFAICS that it effectively competes for one of the MaxConnections > PGPROC slots, meaning that it could fail for lack of a slot or could lock out > a > client that should have been able to connect. > Shouldn't we have added another dedicated PGPROC slot for it? > (proc.c would require some additional work to make that happen.) Thanks for reporting the issue! I can reproduce the issue E.g., slotsync worker cannot start when user backends have reached the max_connections limit. And I agree that it would be better to add another dedicated PGPROC for it. I have prepared a patch attached to address this. > I wonder if the AV launcher and slotsync worker could be reclassified as > "auxiliary > processes" instead of being their own weird animal. It appears that the current aux processes do not run transactions as stated in the comments[1], so we may need to somehow release this restriction to achieve the goal. > [1] > https://www.postgresql.org/message-id/8befc845430ba1ae3748af900af2987 > 88e579c89.camel%40cybertec.at > Best Regards, Hou zj
v1-0001-Reserve-a-dedicated-PGPROC-slot-for-slotsync-work.patch
Description: v1-0001-Reserve-a-dedicated-PGPROC-slot-for-slotsync-work.patch