Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

Saurabh Nanda Sun, 27 Jan 2019 20:52:33 -0800

>
> It is usually not acceptable to run applications with
> synchronous_commit=off, so once you have identified that the bottleneck is
> in implementing synchronous_commit=on, you probably need to take a deep
> dive into your hardware to figure out why it isn't performing the way you
> need/want/expect it to.  Tuning the server under synchronous_commit=off
> when you don't intend to run your production server with that setting is
> unlikely to be fruitful.
>


I do not intend to run the server with synchronous_commit=off, but based on
my limited knowledge, I'm wondering if all these observations are somehow
related and are caused by the same underlying bottleneck (or
misconfiguration):

1) At higher concurrency levels, TPS for synchronous_commit=off is lower
for optimised settings when compared to default settings
2) At ALL concurrency levels, TPS for synchronous_commit=on is lower for
optimised settings (irrespective of shared_buffers value), compared to
default settings
3) At higher concurrency levels, optimised + synchronous_commit=on +
shared_buffers=2G has HIGHER TPS than optimised + synchronous_commit=off +
shared_buffers=8G

Here are the (completely counter-intuitive) numbers for these observations:

+--------+-----------------------------------------------------------------+------------------------+
|        |                      synchronous_commit=on
    | synchronous_commit=off |
+--------+-----------------------------------------------------------------+------------------------+
| client | Mostly defaults [1] | Optimised [2]       | Optimised [2]
    | Optimised [2]          |
|        |                     | + shared_buffers=2G | +
shared_buffers=8G | + shared_buffers=8G    |
+--------+---------------------+---------------------+---------------------+------------------------+
| 1      | 80-86               | 74-77               | 75-75
    | 169-180                |
+--------+---------------------+---------------------+---------------------+------------------------+
| 6      | 350-376             | 301-304             | 295-300
    | 1265-1397              |
+--------+---------------------+---------------------+---------------------+------------------------+
| 12     | 603-619             | 476-488             | 485-493
    | 1746-2352              |
+--------+---------------------+---------------------+---------------------+------------------------+
| 24     | 947-1015            | 678-739             | 723-770
    | 1869-2518              |
+--------+---------------------+---------------------+---------------------+------------------------+
| 48     | 1435-1512           | 950-1043            | 1029-1086
    | 1912-2818              |
+--------+---------------------+---------------------+---------------------+------------------------+
| 96     | 1769-1811           | 3337-3459           | 1302-1346
    | 1546-1753              |
+--------+---------------------+---------------------+---------------------+------------------------+
| 192    | 1857-1992           | 3613-3715           | 1269-1345
    | 1332-1508              |
+--------+---------------------+---------------------+---------------------+------------------------+
| 384    | 1667-1793           | 3180-3300           | 1262-1364
    | 1356-1450              |
+--------+---------------------+---------------------+---------------------+------------------------+



>
> In case you do intend to run with synchronous_commit=off, or if you are
> just curious:  running with a very high number of active connections often
> reveals subtle bottlenecks and interactions, and is very dependent on your
> hardware.  Unless you actually intend to run our server with
> synchronous_commit=off and with a large number of active connections, it is
> probably not worth investigating this.
>

Please see the table above. The reason why I'm digging deeper into this is
because of observation (2) above, i.e. I am unable to come up with any
optimised setting that performs better than the default settings for the
concurrency levels that I care about (100-150).


> I'm more interested in the low end, you should do much better than those
> reported numbers when clients=1 and synchronous_commit=off with the data on
> SSD.  I think you said that pgbench is running on a different machine than
> the database, so perhaps it is just network overhead that is keeping this
> value down.  What happens if you run them on the same machine?
>

I'm currently running this, but the early numbers are surprising. For
client=1, the numbers for optimised settings + shared_buffers=2G are:

-- pgbench run over a 1Gbps network: 74-77 tps
-- pgbench run on the same machine: 152-153 tps (is this absolute number
good enough given my hardware?)

Is 1 Gbps network the bottleneck? Does it explain the three observations
given above? I'll wait for the current set of benchmarks to finish and
re-run the benchmarks over the network and monitor network utilisation.

[1] "Mostly default" settings are whatever ships with Ubuntu 18.04 + PG 11.
A snippet of the relevant setts are given below:

    max_connection=400
    work_mem=4MB
    maintenance_work_mem=64MB
    shared_buffers=128MB
    temp_buffers=8MB
    effective_cache_size=4GB
    wal_buffers=-1
    wal_sync_method=fsync
    max_wal_size=1GB
*    autovacuum=off            # Auto-vacuuming was disabled*


[2] Optimized settings

    max_connections = 400
*    shared_buffers = 8GB           # or 2GB -- depending upon which
scenario was being evaluated*
    effective_cache_size = 24GB
    maintenance_work_mem = 2GB
    checkpoint_completion_target = 0.7
    wal_buffers = 16MB
    default_statistics_target = 100
    random_page_cost = 1.1
    effective_io_concurrency = 200
    work_mem = 3495kB
    min_wal_size = 1GB
    max_wal_size = 2GB
    max_worker_processes = 12
    max_parallel_workers_per_gather = 6
    max_parallel_workers = 12
*    autovacuum=off            # Auto-vacuuming was disabled*

Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

Reply via email to