On 2020/06/29 16:05, Julien Rouhaud wrote:
Hi,

On Mon, Jun 29, 2020 at 7:49 AM Tharakan, Robins <tha...@amazon.com> wrote:

During fully-cached SELECT-only test using pgbench, Postgres v13Beta1 shows

Thanks for the benchmark!


~45% performance drop [2] at high DB connection counts (when compared with 
v12.3)

That's bad :(



Disabling pg_stat_statements.track_planning (which is 'On' by default)
brings the TPS numbers up to v12.3 levels.

The inflection point (in this test-case) is 128 Connections, beyond which the
TPS numbers are consistently low. Looking at the mailing list [1], this issue
didn't surface earlier possibly since the regression is trivial at low 
connection counts.

It would be great if this could be optimized further, or track_planning
disabled (by default) so as to not trip users upgrading from v12 with 
pg_stat_statement
enabled (but otherwise not particularly interested in track_planning).

Your benchmark result seems to suggest that the cause of the problem is
the contention of per-query spinlock in pgss_store(). Right?
This lock contention is likely to happen when multiple sessions run
the same queries.

One idea to reduce that lock contention is to separate per-query spinlock
into two; one is for planning, and the other is for execution. pgss_store()
determines which lock to use based on the given "kind" argument.
To make this idea work, also every pgss counters like shared_blks_hit
need to be separated into two, i.e., for planning and execution.


These are some details around the above test:

pgbench: scale - 100 / threads - 16
test-duration - 30s each
server - 96 vCPUs / 768GB - r5.24xl (AWS EC2 instance)
client - 72 vCPUs / 144GB - c5.18xl (AWS EC2 instance) (co-located with the DB 
server - Same AZ)
v12 - REL_12_STABLE (v12.3)
v13Beta1 - REL_13_STABLE (v13Beta1)
max_connections = 10000
shared_preload_libraries = 'pg_stat_statements'
shared_buffers 128MB

I can't reproduce this on my laptop, but I can certainly believe that
running the same 3 queries using more connections than available cores
will lead to extra overhead.

I disagree with the conclusion though.  It seems to me that if you
really have this workload that consists in these few queries and want
to get better performance, you'll anyway use a connection pooler
and/or use prepared statements, which will make this overhead
disappear entirely, and will also yield an even bigger performance
improvement.  A quick test using pgbench -M prepared, with
track_planning enabled, with still way too many connections already
shows a 25% improvement over the -M simple without track_planning.

I understand your point. But IMO the default setting basically should
be safer value, i.e., off at least until the problem disappears.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to