My team recently received a report about connection establishment times increasing substantially from v16 onwards. Upon further investigation, this seems to have something to do with commit 7389aad (which moved a lot of postmaster code out of signal handlers) in conjunction with workloads that generate many parallel workers. I've attached a set of reproduction steps. The issue seems to be worst on larger machines (e.g., r8g.48xlarge, r5.24xlarge) when max_parallel_workers/max_worker_process is set very high (>= 48).
Our theory is that commit 7389aad (and follow-ups like commit 239b175) made parallel worker processing much more responsive to the point of contending with incoming connections, and that before this change, the kernel balanced the execution of the signal handlers and ServerLoop() to prevent this. I don't have a concrete proposal yet, but I thought it was still worth starting a discussion. TBH I'm not sure we really need to do anything since this arguably comes down to a trade-off between connection and worker responsiveness. -- nathan
setup: psql -h "$host" -U "$user" postgres <<EOF create database mydb; \c mydb create table mytable (a int); insert into mytable (a) select * from generate_series(1, 500000); EOF -- start many parallel workers (may need to change -c and -j): pgbench -h "$host" -U "$user" mydb -c 30 -j 2 -T 300 -f <(echo ' begin; select count(*) from mytable; end;') -- while pgbench is running, check connection establishment times: while :; do time pg_isready -h "$host" -U "$user" -d postgres sleep 1 done