Hi All,

We conducted performance testing of a bi-directional logical
replication setup, focusing on the primary use case of the
update_deleted feature.
To simulate a realistic scenario, we used a high workload with limited
concurrent updates, and well-distributed writes among servers.

Used source
===========
pgHead commit 62a17a92833 + v47 patch set

Machine details
===============
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM

Test-1: Distributed Write Load
==============================
Highlight:
-----------
 - In a bi-directional logical replication setup, with
well-distributed write workloads and a thoughtfully tuned
configuration to minimize lag (e.g., through row filters), TPS
regression is minimal or even negligible.
 - Performance can be sustained with significantly fewer apply workers
compared to the number of client connections on the publisher.

Setup:
--------
 - 2 Nodes(node1 and node2) are created(on same machine) of same
configurations -
    autovacuum = false
    shared_buffers = '30GB'
    -- Also, worker and logical replication related parameters were
increased as per requirement (see attached scripts for details).
 - Both nodes have two set of pgbench tables initiated with *scale=300*:
   -- set1: pgbench_pub_accounts, pgbench_pub_tellers,
pgbench_pub_branches, and pgbench_pub_history
   -- set2: pgbench_accounts, pgbench_tellers, pgbench_branches, and
pgbench_history
 - Node1 is publishing all changes for set1 tables and Node2 has
subscribed for the same.
 - Node2 is publishing all changes for set2 tables and Node2 has
subscribed for the same.
Note: In all the tests, subscriptions are created with (origin=NONE)
as it is a bi-directional replication.

Workload Run:
---------------
 - On node1, pgbench(read-write) with option "-b simple-update" is run
on set1 tables.
 - On node2, pgbench(read-write) with option "-b simple-update" is run
on set2 tables.
 - #clients = 40
 - pgbench run duration = 10 minutes.
 - results were measured for 3 runs of each case.

Test Runs:
- Six tests were done with varying #pub-sub pairs and below is TPS
reduction in both nodes for all the cases:

| Case | # Pub-Sub Pairs | TPS Reduction  |
| ---- | --------------- | -------------- |
| 01   | 30              | 0–1%           |
| 02   | 15              | 6–7%           |
| 03   | 5               | 7–8%           |
| 04   | 3               | 0-1%           |
| 05   | 2               | 14–15%         |
| 06   | 1 (no filters)  | 37–40%         |

 - With appropriate row filters and distribution of load across apply
workers, the performance impact of update_deleted patch can be
minimized.
 - Just 3 pub-sub pairs are enough to keep TPS close to the baseline
for the given workload.
 - Poor distribution of replication workload (e.g., only 1–2 pub-sub
pairs) leads to higher overhead due to increased apply worker
contention.
~~~~

Detailed results for all the above cases:

case-01:
---------
 - Created 30 pub-sub pairs to distribute the replication load between
30 apply workers on each node.

Results:
#run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
patched_Node2_TPS
1   5633.377165   5579.244492   6385.839585   6482.775975
2   5926.328644   5947.035275   6216.045707   6416.113723
3   5522.804663   5542.380108   6541.031535   6190.123097
median   5633.377165   5579.244492   6385.839585   6416.113723
regression  -1%   0%

 - No regression
~~~~

case-02:
---------
 - #pub-sub pairs = 15

Results:
#run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
patched_Node2_TPS
1   8207.708475   7584.288026   8854.017934   8204.301497
2   8120.979334   7404.735801   8719.451895   8169.697482
3   7877.859139   7536.762733   8542.896669   8177.853563
median   8120.979334   7536.762733   8719.451895   8177.853563
regression   -7%   -6%

 - There was 6-7% TPS reduction on both nodes, which seems in acceptable range.
~~~

case-03:
---------
 - #pub-sub pairs = 5

Results:
#run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
patched_Node2_TPS
1   12325.90315   11664.7445   12997.47104   12324.025
2   12060.38753   11370.52775   12728.41287   12127.61208
3   12390.3677   11367.10255   13135.02558   12036.71502
median   12325.90315   11370.52775   12997.47104   12127.61208
regression   -8%   -7%

 - There was 7-8% TPS reduction on both nodes, which seems in acceptable range.
~~~

case-04:
---------
 -  #pub-sub pairs = 3

Results:
#run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
patched_Node2_TPS
1   13186.22898   12464.42604   13973.8394   13370.45596
2   13038.15817   13014.03906   13866.51966   13866.47395
3   13881.10513   13868.71971   14687.67444   14516.33854
median   13186.22898   13014.03906   13973.8394   13866.47395
regression   -1%   -1%

 - No regression observed


case-05:
---------
 -  #pub-sub pairs = 2

Results:
#run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
patched_Node2_TPS
1   15936.98792   13563.98476   16734.35292   14527.22942
2   16031.23003   13648.24979   16958.49609   14657.80008
3   16113.79935   13550.68329   17029.5035   14509.84068
median   16031.23003   13563.98476   16958.49609   14527.22942
regression   -15%   -14%

 - The TPS reduced by 14-15% on both nodes.
~~~

case-06:
---------
 - #pub-sub pairs = 1 , no row filter is used on both nodes

Results:
#run   pgHead_Node1_TPS   patched_Node1_TPS   pgHead_Node2_TPS
patched_Node2_TPS
1   22900.06507   13609.60639   23254.25113   14592.25271
2   22110.98426   13907.62583   22755.89945   14805.73717
3   22719.88901   13246.41484   23055.70406   14256.54223
median 22719.88901 13609.60639 23055.70406   14592.25271
regression   -40%   -37%

- The regression observed is 37-40% on both nodes.
~~~~


Test-2: High concurrency
===========================
Highlight:
------------
 Despite poor write distribution across servers and high concurrent
updates, distributing replication load across multiple apply workers
limited the TPS drop to just 15–18%.

Setup:
---------------
 - 2 Nodes(node1 and node2) are created with same configuration as in Test-01
 - Both nodes have same set of pgbench tables initialized with
scale=60 (small tables to increase concurrent updates)
 - Both nodes are subscribed to each other for all the changes.
  -- 15 pub-sub pairs are created using row filters to distribute the
load and all the subscriptions are created with (origin = NONE).

Workload Run:
---------------
 - On both nodes,the default pgbench(read-write) is run on tables.
 - #clients = 15
 - pgbench run duration = 5 minutes.
 - results were measured for 2 runs of each case.

Results:

Node1 TPS:
#run   pgHead_Node1_TPS   patched_Node1_TPS
1   9585.470749   7660.645249
2   9442.364918   8035.531482
median   9513.917834   7848.088366
regression     -18%

Node2 TPS:

#run   pgHead_Node2_TPS   patched_Node2_TPS
1   9485.232611   8248.783417
2   9468.894086   7938.991136
median  9477.063349   8093.887277
regression    -15%

- Under high concurrent writes to the same small tables, contention
increases and the TPS drop is 15-18% on both nodes.
~~~~

The scripts used for above tests are attached.

--
Thanks,
Nisha

<<attachment: bi_dir_test_scripts.zip>>

Reply via email to