Hi All, We conducted performance testing of a bi-directional logical replication setup, focusing on the primary use case of the update_deleted feature. To simulate a realistic scenario, we used a high workload with limited concurrent updates, and well-distributed writes among servers.
Used source =========== pgHead commit 62a17a92833 + v47 patch set Machine details =============== Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz CPU(s) :88 cores, - 503 GiB RAM Test-1: Distributed Write Load ============================== Highlight: ----------- - In a bi-directional logical replication setup, with well-distributed write workloads and a thoughtfully tuned configuration to minimize lag (e.g., through row filters), TPS regression is minimal or even negligible. - Performance can be sustained with significantly fewer apply workers compared to the number of client connections on the publisher. Setup: -------- - 2 Nodes(node1 and node2) are created(on same machine) of same configurations - autovacuum = false shared_buffers = '30GB' -- Also, worker and logical replication related parameters were increased as per requirement (see attached scripts for details). - Both nodes have two set of pgbench tables initiated with *scale=300*: -- set1: pgbench_pub_accounts, pgbench_pub_tellers, pgbench_pub_branches, and pgbench_pub_history -- set2: pgbench_accounts, pgbench_tellers, pgbench_branches, and pgbench_history - Node1 is publishing all changes for set1 tables and Node2 has subscribed for the same. - Node2 is publishing all changes for set2 tables and Node2 has subscribed for the same. Note: In all the tests, subscriptions are created with (origin=NONE) as it is a bi-directional replication. Workload Run: --------------- - On node1, pgbench(read-write) with option "-b simple-update" is run on set1 tables. - On node2, pgbench(read-write) with option "-b simple-update" is run on set2 tables. - #clients = 40 - pgbench run duration = 10 minutes. - results were measured for 3 runs of each case. Test Runs: - Six tests were done with varying #pub-sub pairs and below is TPS reduction in both nodes for all the cases: | Case | # Pub-Sub Pairs | TPS Reduction | | ---- | --------------- | -------------- | | 01 | 30 | 0–1% | | 02 | 15 | 6–7% | | 03 | 5 | 7–8% | | 04 | 3 | 0-1% | | 05 | 2 | 14–15% | | 06 | 1 (no filters) | 37–40% | - With appropriate row filters and distribution of load across apply workers, the performance impact of update_deleted patch can be minimized. - Just 3 pub-sub pairs are enough to keep TPS close to the baseline for the given workload. - Poor distribution of replication workload (e.g., only 1–2 pub-sub pairs) leads to higher overhead due to increased apply worker contention. ~~~~ Detailed results for all the above cases: case-01: --------- - Created 30 pub-sub pairs to distribute the replication load between 30 apply workers on each node. Results: #run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS patched_Node2_TPS 1 5633.377165 5579.244492 6385.839585 6482.775975 2 5926.328644 5947.035275 6216.045707 6416.113723 3 5522.804663 5542.380108 6541.031535 6190.123097 median 5633.377165 5579.244492 6385.839585 6416.113723 regression -1% 0% - No regression ~~~~ case-02: --------- - #pub-sub pairs = 15 Results: #run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS patched_Node2_TPS 1 8207.708475 7584.288026 8854.017934 8204.301497 2 8120.979334 7404.735801 8719.451895 8169.697482 3 7877.859139 7536.762733 8542.896669 8177.853563 median 8120.979334 7536.762733 8719.451895 8177.853563 regression -7% -6% - There was 6-7% TPS reduction on both nodes, which seems in acceptable range. ~~~ case-03: --------- - #pub-sub pairs = 5 Results: #run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS patched_Node2_TPS 1 12325.90315 11664.7445 12997.47104 12324.025 2 12060.38753 11370.52775 12728.41287 12127.61208 3 12390.3677 11367.10255 13135.02558 12036.71502 median 12325.90315 11370.52775 12997.47104 12127.61208 regression -8% -7% - There was 7-8% TPS reduction on both nodes, which seems in acceptable range. ~~~ case-04: --------- - #pub-sub pairs = 3 Results: #run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS patched_Node2_TPS 1 13186.22898 12464.42604 13973.8394 13370.45596 2 13038.15817 13014.03906 13866.51966 13866.47395 3 13881.10513 13868.71971 14687.67444 14516.33854 median 13186.22898 13014.03906 13973.8394 13866.47395 regression -1% -1% - No regression observed case-05: --------- - #pub-sub pairs = 2 Results: #run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS patched_Node2_TPS 1 15936.98792 13563.98476 16734.35292 14527.22942 2 16031.23003 13648.24979 16958.49609 14657.80008 3 16113.79935 13550.68329 17029.5035 14509.84068 median 16031.23003 13563.98476 16958.49609 14527.22942 regression -15% -14% - The TPS reduced by 14-15% on both nodes. ~~~ case-06: --------- - #pub-sub pairs = 1 , no row filter is used on both nodes Results: #run pgHead_Node1_TPS patched_Node1_TPS pgHead_Node2_TPS patched_Node2_TPS 1 22900.06507 13609.60639 23254.25113 14592.25271 2 22110.98426 13907.62583 22755.89945 14805.73717 3 22719.88901 13246.41484 23055.70406 14256.54223 median 22719.88901 13609.60639 23055.70406 14592.25271 regression -40% -37% - The regression observed is 37-40% on both nodes. ~~~~ Test-2: High concurrency =========================== Highlight: ------------ Despite poor write distribution across servers and high concurrent updates, distributing replication load across multiple apply workers limited the TPS drop to just 15–18%. Setup: --------------- - 2 Nodes(node1 and node2) are created with same configuration as in Test-01 - Both nodes have same set of pgbench tables initialized with scale=60 (small tables to increase concurrent updates) - Both nodes are subscribed to each other for all the changes. -- 15 pub-sub pairs are created using row filters to distribute the load and all the subscriptions are created with (origin = NONE). Workload Run: --------------- - On both nodes,the default pgbench(read-write) is run on tables. - #clients = 15 - pgbench run duration = 5 minutes. - results were measured for 2 runs of each case. Results: Node1 TPS: #run pgHead_Node1_TPS patched_Node1_TPS 1 9585.470749 7660.645249 2 9442.364918 8035.531482 median 9513.917834 7848.088366 regression -18% Node2 TPS: #run pgHead_Node2_TPS patched_Node2_TPS 1 9485.232611 8248.783417 2 9468.894086 7938.991136 median 9477.063349 8093.887277 regression -15% - Under high concurrent writes to the same small tables, contention increases and the TPS drop is 15-18% on both nodes. ~~~~ The scripts used for above tests are attached. -- Thanks, Nisha
<<attachment: bi_dir_test_scripts.zip>>