On Thursday, December 26, 2024 3:50 AM Tom Lane <t...@sss.pgh.pa.us>

Hi,

> In connection with the discussion at [1], I started to look at exactly which 
> server
> processes ought to be subject to connection limits (datconnlimit,
> ACL_CONNECT, and related checks).  The current situation seems to be an
> inconsistent mess.
> 
> Looking at InitPostgres, InitializeSessionUserId, and CheckMyDatabase, we
> have several different permissions and resource-limiting checks that may be
> enforced against an incoming process:
>       ReservedConnections/SuperuserReservedConnections
>       rolcanlogin
>       rolconnlimit
>       datallowconn
>       datconnlimit
>       database's ACL_CONNECT privilege
> 
> I want to argue that ReservedConnections, rolconnlimit, and datconnlimit
> should only be applied to regular backends.  It makes no particular sense to
> enforce them against autovac workers, background workers, or wal senders,
> because each of those types of processes has its own resource-limiting
> PGPROC pool.  It's especially bad to enforce them against parallel workers,
> since that creates edge-case failures that make the use of parallelism not
> transparent to applications.  We hacked around that at 73c9f91a1, but I think
> that should be reverted in favor of not applying the check at all.
> 
> I further argue that rolcanlogin, datallowconn, and ACL_CONNECT should not
> be checked in a parallel worker, again primarily on the grounds that this 
> creates
> parallelism-specific failures (cf [1]).
> The two scenarios where this occurs are (1) permissions were revoked since
> the leader process connected, or (2) leader is currently running as a role 
> that
> wouldn't have permission to connect on its own.  We don't attempt to kick out
> the leader process when either of those things happen, so why should we
> prevent it from using parallelism?
> 
> The current situation for each of these checks is:
> 
> ReservedConnections is enforced if !am_superuser && !am_walsender, so it is
> enforced against non-superuser background workers, which is silly because
> BG workers have their own PGPROC pool; moreover, what's the argument for
> letting walsenders but not other kinds of background processes escape this?
> I propose changing it to apply only to regular backends.
> 
> rolcanlogin is enforced if IsUnderPostmaster and we reach
> InitializeSessionUserId, which basically reduces to regular backends, parallel
> workers, logrep workers, and walsenders.  Seems reasonable for logrep
> workers and walsenders which represent fresh logins, but not for parallel
> workers.  I propose fixing this by making ParallelWorkerMain pass
> BGWORKER_BYPASS_ROLELOGINCHECK.
> 
> rolconnlimit is enforced if IsUnderPostmaster and we reach
> InitializeSessionUserId and it's not a superuser.  So that applies to
> non-superuser parallel workers, logrep workers, and walsenders, and I don't
> think it's reasonable to apply it to any of them since those all come out of 
> other
> PGPROC pools.  I propose switching that to apply only to regular backends.
> 
> BTW, I kind of wonder why rolconnlimit is ineffectual for superusers, 
> especially
> when rolcanlogin does apply to them.  Not a bug exactly, but it sure seems
> inconsistent.  If you've taken the trouble to set it you'd expect it to work. 
>  Shall
> we take out the !is_superuser check?
> 
> datallowconn is enforced against all non-standalone, non-AV-worker
> processes that connect to a specific database, except bgworkers that pass
> BGWORKER_BYPASS_ALLOWCONN (nothing in core except test module).
> So again that includes parallel workers, logrep workers, and walsenders.
> Again this seems reasonable for logrep workers and walsenders but not for
> parallel workers.  I propose fixing this by making ParallelWorkerMain pass
> BGWORKER_BYPASS_ALLOWCONN.
> 
> datconnlimit is enforced against all non-superuser processes, including
> per-DB walsenders and BG workers (see above).
> This is fairly dubious given that they have their own PGPROC pools.
> I propose switching that to apply only to regular backends, too.
> 
> ACL_CONNECT is enforced against all non-superuser processes, including
> per-DB walsenders and BG workers (includes parallel workers, subscription
> apply workers, logrep workers).
> Perhaps that's OK for most, but I argue not for parallel workers; maybe skip 
> it if
> BGWORKER_BYPASS_ALLOWCONN?
> 
> Also, the enforcement of datconnlimit and rolconnlimit is inconsistent in
> another way: our counting of the pre-existing processes is pretty random.
> CountDBConnections is not consistent with either the current set of processes
> that datconnlimit is enforced against, or my proposal to enforce it only 
> against
> regular backends.  It counts anything that is not
> AmBackgroundWorkerProcess, including AV workers and per-DB walsenders.
> I think it should count only regular backends, because anything else leads to
> weird inconsistencies in whether a rejection occurs.
> 
> The same applies to CountUserBackends (used for rolconnlimit check).
> I argue these two functions should count only regular backends, and the
> enforcement should likewise be only against regular backends.

Personally, I find these proposals to be reasonable.

> Another recently-created problem is that the "slotsync worker"
> process type we added in v17 hasn't been considered in any of this.
> In particular, unlike every other process type that can obtain a PGPROC, we 
> did
> not consider it in the MaxBackends calculation:
> 
>       /* the extra unit accounts for the autovacuum launcher */
>       MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
>               max_worker_processes + max_wal_senders;
> 
> This means AFAICS that it effectively competes for one of the MaxConnections
> PGPROC slots, meaning that it could fail for lack of a slot or could lock out 
> a
> client that should have been able to connect.
> Shouldn't we have added another dedicated PGPROC slot for it?
> (proc.c would require some additional work to make that happen.)

Thanks for reporting the issue! I can reproduce the issue E.g., slotsync worker
cannot start when user backends have reached the max_connections limit. And I
agree that it would be better to add another dedicated PGPROC for it. I have
prepared a patch attached to address this.


> I wonder if the AV launcher and slotsync worker could be reclassified as 
> "auxiliary
> processes" instead of being their own weird animal.

It appears that the current aux processes do not run transactions as stated in 
the
comments[1], so we may need to somehow release this restriction to achieve the
goal.


> [1]
> https://www.postgresql.org/message-id/8befc845430ba1ae3748af900af2987
> 88e579c89.camel%40cybertec.at
> 

Best Regards,
Hou zj

Attachment: v1-0001-Reserve-a-dedicated-PGPROC-slot-for-slotsync-work.patch
Description: v1-0001-Reserve-a-dedicated-PGPROC-slot-for-slotsync-work.patch

Reply via email to